Search tips
Search criteria

Results 1-25 (69)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Integrating big data and actionable health coaching to optimize wellness 
BMC Medicine  2015;13:4.
The Hundred Person Wellness Project (HPWP) is a 10-month pilot study of 100 ‘well’ individuals where integrated data from whole-genome sequencing, gut microbiome, clinical laboratory tests and quantified self measures from each individual are used to provide actionable results for health coaching with the goal of optimizing wellness and minimizing disease. In a commentary in BMC Medicine, Diamandis argues that HPWP and similar projects will likely result in ‘unnecessary and potential harmful over-testing’. We argue that this new approach will ultimately lead to lower costs, better healthcare, innovation and economic growth. The central points of the HPWP are: 1) it is focused on optimizing wellness through longitudinal data collection, integration and mining of individual data clouds, enabling development of predictive models of wellness and disease that will reveal actionable possibilities; and 2) by extending this study to 100,000 well people, we will establish multiparameter, quantifiable wellness metrics and identify markers for wellness to early disease transitions for most common diseases, which will ultimately allow earlier disease intervention, eventually transitioning the individual early on from a disease back to a wellness trajectory.
Please see related commentary:
PMCID: PMC4288554  PMID: 25575752
Wellness; Personalized medicine; Whole-genome sequencing; Health behavior change; Actionable; P4 Medicine; Systems medicine; Gut microbiome
2.  Whole-Genome Sequencing of the World’s Oldest People 
PLoS ONE  2014;9(11):e112430.
Supercentenarians (110 years or older) are the world’s oldest people. Seventy four are alive worldwide, with twenty two in the United States. We performed whole-genome sequencing on 17 supercentenarians to explore the genetic basis underlying extreme human longevity. We found no significant evidence of enrichment for a single rare protein-altering variant or for a gene harboring different rare protein altering variants in supercentenarian compared to control genomes. We followed up on the gene most enriched for rare protein-altering variants in our cohort of supercentenarians, TSHZ3, by sequencing it in a second cohort of 99 long-lived individuals but did not find a significant enrichment. The genome of one supercentenarian had a pathogenic mutation in DSC2, known to predispose to arrhythmogenic right ventricular cardiomyopathy, which is recommended to be reported to this individual as an incidental finding according to a recent position statement by the American College of Medical Genetics and Genomics. Even with this pathogenic mutation, the proband lived to over 110 years. The entire list of rare protein-altering variants and DNA sequence of all 17 supercentenarian genomes is available as a resource to assist the discovery of the genetic basis of extreme longevity in future studies.
PMCID: PMC4229186  PMID: 25390934
3.  P4 medicine: how systems medicine will transform the healthcare sector and society 
Personalized medicine  2013;10(6):565-576.
Ten years ago, the proposition that healthcare is evolving from reactive disease care to care that is predictive, preventive, personalized and participatory was regarded as highly speculative. Today, the core elements of that vision are widely accepted and have been articulated in a series of recent reports by the US Institute of Medicine. Systems approaches to biology and medicine are now beginning to provide patients, consumers and physicians with personalized information about each individual’s unique health experience of both health and disease at the molecular, cellular and organ levels. This information will make disease care radically more cost effective by personalizing care to each person’s unique biology and by treating the causes rather than the symptoms of disease. It will also provide the basis for concrete action by consumers to improve their health as they observe the impact of lifestyle decisions. Working together in digitally powered familial and affinity networks, consumers will be able to reduce the incidence of the complex chronic diseases that currently account for 75% of disease-care costs in the USA.
PMCID: PMC4204402  PMID: 25342952
big data; knowledge network; learning healthcare; new taxonomy of disease; omics studies; P4 medicine; personal data clouds; systems biology; systems medicine; wellness industry
4.  The Human Genome Project: big science transforms biology and medicine 
Genome Medicine  2013;5(9):79.
The Human Genome Project has transformed biology through its integrated big science approach to deciphering a reference human genome sequence along with the complete sequences of key model organisms. The project exemplifies the power, necessity and success of large, integrated, cross-disciplinary efforts - so-called ‘big science’ - directed towards complex major objectives. In this article, we discuss the ways in which this ambitious endeavor led to the development of novel technologies and analytical tools, and how it brought the expertise of engineers, computer scientists and mathematicians together with biologists. It established an open approach to data sharing and open-source software, thereby making the data resulting from the project accessible to all. The genome sequences of microbes, plants and animals have revolutionized many fields of science, including microbiology, virology, infectious disease and plant biology. Moreover, deeper knowledge of human sequence variation has begun to alter the practice of medicine. The Human Genome Project has inspired subsequent large-scale data acquisition initiatives such as the International HapMap Project, 1000 Genomes, and The Cancer Genome Atlas, as well as the recently announced Human Brain Project and the emerging Human Proteome Project.
PMCID: PMC4066586  PMID: 24040834
5.  A unified test of linkage analysis and rare-variant association for analysis of pedigree sequence data 
Nature biotechnology  2014;32(7):663-669.
High-throughput sequencing of related individuals has become an important tool for studying human disease. However, owing to technical complexity and lack of available tools, most pedigree-based sequencing studies rely on an ad hoc combination of suboptimal analyses. Here we present pedigree-VAAST (pVAAST), a disease-gene identification tool designed for high-throughput sequence data in pedigrees. pVAAST uses a sequence-based model to perform variant and gene-based linkage analysis. Linkage information is then combined with functional prediction and rare variant case-control association information in a unified statistical framework. pVAAST outperformed linkage and rare-variant association tests in simulations and identified disease-causing genes from whole-genome sequence data in three human pedigrees with dominant, recessive and de novo inheritance patterns. The approach is robust to incomplete penetrance and locus heterogeneity and is applicable to a wide variety of genetic traits. pVAAST maintains high power across studies of monogenic, high-penetrance phenotypes in a single pedigree to highly polygenic, common phenotypes involving hundreds of pedigrees.
PMCID: PMC4157619  PMID: 24837662
6.  A Blood-Based Proteomic Classifier for the Molecular Characterization of Pulmonary Nodules 
Science translational medicine  2013;5(207):207ra142.
Each year millions of pulmonary nodules are discovered by computed tomography and subsequently biopsied. As the majority of these nodules are benign, many patients undergo unnecessary and costly invasive procedures. We present a 13-protein blood-based classifier that differentiates malignant and benign nodules with high confidence, thereby providing a diagnostic tool to avoid invasive biopsy on benign nodules. Using a systems biology strategy, 371 protein candidates were identified and a multiple reaction monitoring (MRM) assay was developed for each. The MRM assays were applied in a three-site discovery study (n = 143) on plasma samples from patients with benign and Stage IA cancer matched on nodule size, age, gender and clinical site, producing a 13-protein classifier. The classifier was validated on an independent set of plasma samples (n = 104), exhibiting a high negative predictive value (NPV) of 90%. Validation performance on samples from a non-discovery clinical site showed NPV of 94%, indicating the general effectiveness of the classifier. A pathway analysis demonstrated that the classifier proteins are likely modulated by a few transcription regulators (NF2L2, AHR, MYC, FOS) that are associated with lung cancer, lung inflammation and oxidative stress networks. The classifier score was independent of patient nodule size, smoking history and age, which are risk factors used for clinical management of pulmonary nodules. Thus this molecular test can provide a powerful complementary tool for physicians in lung cancer diagnosis.
PMCID: PMC4114963  PMID: 24132637
7.  Realistic artificial DNA sequences as negative controls for computational genomics 
Nucleic Acids Research  2014;42(12):e99.
A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at
PMCID: PMC4081056  PMID: 24803667
8.  New and improved proteomics technologies for understanding complex biological systems: Addressing a grand challenge in the life sciences 
Proteomics  2012;12(18):2773-2783.
This White Paper sets out a Life Sciences Grand Challenge for Proteomics Technologies to enhance our understanding of complex biological systems, link genomes with phenotypes, and bring broad benefits to the biosciences and the US economy. The paper is based on a workshop hosted by the National Institute of Standards and Technology (NIST) in Gaithersburg, MD, 14–15 February 2011, with participants from many federal R&D agencies and research communities, under the aegis of the US National Science and Technology Council (NSTC). Opportunities are identified for a coordinated R&D effort to achieve major technology-based goals and address societal challenges in health, agriculture, nutrition, energy, environment, national security, and economic development.
PMCID: PMC4005326  PMID: 22807061
Complex systems; Democratization of proteomics; Economic growth; Grand challenges; Integration; Systems biology
9.  Systems Cancer Medicine: Towards Realization of Predictive, Preventive, Personalized, and Participatory (P4) Medicine 
Journal of internal medicine  2012;271(2):111-121.
A grand challenge impeding optimal treatment outcomes for cancer patients arises from the complex nature of the disease: the cellular heterogeneity, the myriad of dysfunctional molecular and genetic networks as results of genetic (somatic) and environmental perturbations. Systems biology, with its holistic approach to understanding fundamental principles in biology, and the empowering technologies in genomics, proteomics, single-cell analysis, microfluidics, and computational strategies, enables a comprehensive approach to medicine, which strives to unveil the pathogenic mechanisms of diseases, identify disease biomarkers and begin thinking about new strategies for drug target discovery. The integration of multi-dimensional high throughput “omics” measurements from tumor tissues and corresponding blood specimens, together with new systems strategies for diagnostics, enables the identification of cancer biomarkers that will enable presymptomatic diagnosis, stratification of disease, assessment of disease progression, evaluation of patient response to therapy, and the identification of reoccurrences. While some aspects of systems medicine are being adopted in clinical oncology practice through companion molecular diagnostics for personalized therapy, the mounting influx of global quantitative data from both wellness and diseases, is shaping up a transformational paradigm in medicine we termed predictive, preventive, personalized, and participatory (P4) medicine, which requires new strategies, both scientific and organizational, to enable bringing this revolution in medicine to patients and to the healthcare system. P4 medicine will have a profound impact on society—transforming the healthcare system, turning around the ever escalating costs of healthcare, digitizing the practice of medicine and creating enormous economic opportunities for those organizations and nations that embrace this revolution
PMCID: PMC3978383  PMID: 22142401
Systems medicine; cancer complexity; quantized cell populations; blood biomarkers; molecular diagnostics; P4 medicine
10.  Revolutionizing medicine in the 21st century through systems approaches 
Biotechnology journal  2012;7(8):992-1001.
Personalized medicine is a term for a revolution in medicine that envisions the individual patient as the central focus of healthcare in the future. The term “personalized medicine”, however, fails to reflect the enormous dimensionality of this new medicine that will be predictive, preventive, personalized, and participatory – a vision of medicine we have termed P4 medicine. This reflects a paradigm change in how medicine will be practiced that is revolutionary rather than evolutionary. P4 medicine arises from the confluence of a systems approach to medicine and from the digitalization of medicine that creates the large data sets necessary to deal with the complexities of disease. We predict that systems approaches will empower the transition from conventional reactive medical practice to a more proactive P4 medicine focused on wellness, and will reverse the escalating costs of drug development and will have enormous social and economic benefits. Our vision for P4 medicine in 10 years is that each patient will be associated with a virtual data cloud of billions of data points and that we will have the information technology for healthcare to reduce this enormous data dimensionality to simple hypotheses about health and/or disease for each individual. These data will be multi-scale across all levels of biological organization and extremely heterogeneous in type – this enormous amount of data represents a striking signal-to-noise (S/N) challenge. The key to dealing with this S/N challenge is to take a “holistic systems approach” to disease as we will discuss in this article.
PMCID: PMC3962497  PMID: 22815171
Functional genomics; Network biology; Personalized medicine; Systems medicine
11.  Relationship Estimation from Whole-Genome Sequence Data 
PLoS Genetics  2014;10(1):e1004144.
The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1st through 6th degree relationships, and 55% of 9th through 11th degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1st through 9th degree relationships from whole-genome sequence data.
Author Summary
The determination of the relationship between a pair of individuals is a fundamental application of genetics. The most accurate methods for relationship estimation rely on precise, localized estimates of genetic sharing between individuals. Earlier methods have generated these estimates from high-density genetic marker data. We performed relationship estimation using whole-genome sequence data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. Our results demonstrate that complexities specific to whole-genome sequencing result in regions of the genome that are prone to false-positive estimates of genetic sharing. We provide a map of these spurious IBD regions and introduce new methods, implemented in the software package ERSA 2.0, to control for spurious IBD. We show that ERSA 2.0 provides a 5% to 15% increase in relationship detection power for distant relationships with whole-genome sequence data relative to high-density genetic marker data.
PMCID: PMC3907355  PMID: 24497848
12.  Correction: Optimal Scaling of Digital Transcriptomes 
PLoS ONE  2014;9(1):10.1371/annotation/8b05a9ab-c8ad-4276-a851-1e265055fb65.
PMCID: PMC3885755
13.  Quantitative Liver-Specific Protein Fingerprint in Blood: A Signature for Hepatotoxicity 
Theranostics  2014;4(2):215-228.
We discuss here a new approach to detecting hepatotoxicity by employing concentration changes of liver-specific blood proteins during disease progression. These proteins are capable of assessing the behaviors of their cognate liver biological networks for toxicity or disease perturbations. Blood biomarkers are highly desirable diagnostics as blood is easily accessible and baths virtually all organs. Fifteen liver-specific blood proteins were identified as markers of acetaminophen (APAP)-induced hepatotoxicity using three proteomic technologies: label-free antibody microarrays, quantitative immunoblotting, and targeted iTRAQ mass spectrometry. Liver-specific blood proteins produced a toxicity signature of eleven elevated and four attenuated blood protein levels. These blood protein perturbations begin to provide a systems view of key mechanistic features of APAP-induced liver injury relating to glutathione and S-adenosyl-L-methionine (SAMe) depletion, mitochondrial dysfunction, and liver responses to the stress. Two markers, elevated membrane-bound catechol-O-methyltransferase (MB-COMT) and attenuated retinol binding protein 4 (RBP4), report hepatic injury significantly earlier than the current gold standard liver biomarker, alanine transaminase (ALT). These biomarkers were perturbed prior to onset of irreversible liver injury. Ideal markers should be applicable for both rodent model studies and human clinical trials. Five of these mouse liver-specific blood markers had human orthologs that were also found to be responsive to human hepatotoxicity. This panel of liver-specific proteins has the potential to effectively identify the early toxicity onset, the nature and extent of liver injury and report on some of the APAP-perturbed liver networks.
PMCID: PMC3900804  PMID: 24465277
liver injury; toxicity; biomarker; RBP4; COMT; CPS1; BHMT.
14.  Participatory medicine: a driving force for revolutionizing healthcare 
Genome Medicine  2013;5(12):110.
PMCID: PMC3978637  PMID: 24360023
15.  Systems Approaches to Biology and Disease Enable Translational Systems Medicine 
Genomics, proteomics & bioinformatics  2012;10(4):10.1016/j.gpb.2012.08.004.
The development and application of systems strategies to biology and disease are transforming medical research and clinical practice in an unprecedented rate. In the foreseeable future, clinicians, medical researchers, and ultimately the consumers and patients will be increasingly equipped with a deluge of personal health information, e.g., whole genome sequences, molecular profiling of diseased tissues, and periodic multi-analyte blood testing of biomarker panels for disease and wellness. The convergence of these practices will enable accurate prediction of disease susceptibility and early diagnosis for actionable preventive schema and personalized treatment regimes tailored to each individual. It will also entail proactive participation from all major stakeholders in the health care system. We are at the dawn of predictive, preventive, personalized, and participatory (P4) medicine, the fully implementation of which requires marrying basic and clinical researches through advanced systems thinking and the employment of high-throughput technologies in genomics, proteomics, nanofluidics, single-cell analysis, and computation strategies in a highly-orchestrated discipline we termed translational systems medicine.
PMCID: PMC3844613  PMID: 23084773
Systems biology; P4 Medicine; Family genome sequencing; Targeted proteomics; Single-cell analysis
16.  Optimal Scaling of Digital Transcriptomes 
PLoS ONE  2013;8(11):e77885.
Deep sequencing of transcriptomes has become an indispensable tool for biology, enabling expression levels for thousands of genes to be compared across multiple samples. Since transcript counts scale with sequencing depth, counts from different samples must be normalized to a common scale prior to comparison. We analyzed fifteen existing and novel algorithms for normalizing transcript counts, and evaluated the effectiveness of the resulting normalizations. For this purpose we defined two novel and mutually independent metrics: (1) the number of “uniform” genes (genes whose normalized expression levels have a sufficiently low coefficient of variation), and (2) low Spearman correlation between normalized expression profiles of gene pairs. We also define four novel algorithms, one of which explicitly maximizes the number of uniform genes, and compared the performance of all fifteen algorithms. The two most commonly used methods (scaling to a fixed total value, or equalizing the expression of certain ‘housekeeping’ genes) yielded particularly poor results, surpassed even by normalization based on randomly selected gene sets. Conversely, seven of the algorithms approached what appears to be optimal normalization. Three of these algorithms rely on the identification of “ubiquitous” genes: genes expressed in all the samples studied, but never at very high or very low levels. We demonstrate that these include a “core” of genes expressed in many tissues in a mutually consistent pattern, which is suitable for use as an internal normalization guide. The new methods yield robustly normalized expression values, which is a prerequisite for the identification of differentially expressed and tissue-specific genes as potential biomarkers.
PMCID: PMC3819321  PMID: 24223126
17.  SRM Targeted Proteomics in Search for Biomarkers of HCV-Induced Progression of Fibrosis to Cirrhosis in HALT-C Patients 
Proteomics  2012;12(8):1244-1252.
The current gold standard for diagnosis of hepatic fibrosis and cirrhosis is the traditional invasive liver biopsy. It is desirable to assess hepatic fibrosis with noninvasive means. Targeted proteomic techniques allow an unbiased assessment of proteins and might be useful to identify proteins related to hepatic fibrosis. We utilized Selected Reaction Monitoring (SRM) targeted proteomics combined with an organ-specific blood protein strategy to identify and quantify 38 liver-specific proteins. A combination of protein C and retinol binding protein 4 in serum gave promising preliminary results as candidate biomarkers to distinguish patients at different stages of hepatic fibrosis due to chronic infection with hepatitis C virus (HCV). Also, alpha-1-B glycoprotein, complement factor H and insulin-like growth factor binding protein acid labile subunit performed well in distinguishing patients from healthy controls.
PMCID: PMC3766736  PMID: 22577025
hepatitis C; fibrosis; liver-specific blood biomarkers; quantitation; selected reaction monitoring
18.  Multi-study Integration of Brain Cancer Transcriptomes Reveals Organ-Level Molecular Signatures 
PLoS Computational Biology  2013;9(7):e1003148.
We utilized abundant transcriptomic data for the primary classes of brain cancers to study the feasibility of separating all of these diseases simultaneously based on molecular data alone. These signatures were based on a new method reported herein – Identification of Structured Signatures and Classifiers (ISSAC) – that resulted in a brain cancer marker panel of 44 unique genes. Many of these genes have established relevance to the brain cancers examined herein, with others having known roles in cancer biology. Analyses on large-scale data from multiple sources must deal with significant challenges associated with heterogeneity between different published studies, for it was observed that the variation among individual studies often had a larger effect on the transcriptome than did phenotype differences, as is typical. For this reason, we restricted ourselves to studying only cases where we had at least two independent studies performed for each phenotype, and also reprocessed all the raw data from the studies using a unified pre-processing pipeline. We found that learning signatures across multiple datasets greatly enhanced reproducibility and accuracy in predictive performance on truly independent validation sets, even when keeping the size of the training set the same. This was most likely due to the meta-signature encompassing more of the heterogeneity across different sources and conditions, while amplifying signal from the repeated global characteristics of the phenotype. When molecular signatures of brain cancers were constructed from all currently available microarray data, 90% phenotype prediction accuracy, or the accuracy of identifying a particular brain cancer from the background of all phenotypes, was found. Looking forward, we discuss our approach in the context of the eventual development of organ-specific molecular signatures from peripheral fluids such as the blood.
Author Summary
From a multi-study, integrated transcriptomic dataset, we identified a marker panel for differentiating major human brain cancers at the gene-expression level. The ISSAC molecular signatures for brain cancers, composed of 44 unique genes, are based on comparing expression levels of pairs of genes, and phenotype prediction follows a diagnostic hierarchy. We found that sufficient dataset integration across multiple studies greatly enhanced diagnostic performance on truly independent validation sets, whereas signatures learned from only one dataset typically led to high error rate. Molecular signatures of brain cancers, when obtained using all currently available gene-expression data, achieved 90% phenotype prediction accuracy. Thus, our integrative approach holds significant promise for developing organ-level, comprehensive, molecular signatures of disease.
PMCID: PMC3723500  PMID: 23935471
19.  Nanomedicine Targets CANCER 
Scientific American  2009;300(2):44-51.
PMCID: PMC3700418  PMID: 19186705
20.  Integration of biological networks and gene expression data using Cytoscape 
Nature protocols  2007;2(10):2366-2382.
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape.
PMCID: PMC3685583  PMID: 17947979
21.  A Review of Computational Tools in microRNA Discovery 
Since microRNAs (miRNAs) were discovered, their impact on regulating various biological activities has been a surprising and exciting field. Knowing the entire repertoire of these small molecules is the first step to gain a better understanding of their function. High throughput discovery tools such as next-generation sequencing significantly increased the number of known miRNAs in different organisms in recent years. However, the process of being able to accurately identify miRNAs is still a complex and difficult task, requiring the integration of experimental approaches with computational methods. A number of prediction algorithms based on characteristics of miRNA molecules have been developed to identify new miRNA species. Different approaches have certain strengths and weaknesses and in this review, we aim to summarize several commonly used tools in metazoan miRNA discovery.
PMCID: PMC3654206  PMID: 23720668
isomer; machine learning; miRNA conservation; RNA secondary structure; sequence homology
22.  Systems Biology and P4 Medicine: Past, Present, and Future 
Studying complex biological systems in a holistic rather than a “one gene or one protein” at a time approach requires the concerted effort of scientists from a wide variety of disciplines. The Institute for Systems Biology (ISB) has seamlessly integrated these disparate fields to create a cross-disciplinary platform and culture in which “biology drives technology drives computation.” To achieve this platform/culture, it has been necessary for cross-disciplinary ISB scientists to learn one another’s languages and work together effectively in teams. The focus of this “systems” approach on disease has led to a discipline denoted systems medicine. The advent of technological breakthroughs in the fields of genomics, proteomics, and, indeed, the other “omics” is catalyzing striking advances in systems medicine that have and are transforming diagnostic and therapeutic strategies. Systems medicine has united genomics and genetics through family genomics to more readily identify disease genes. It has made blood a window into health and disease. It is leading to the stratification of diseases (division into discrete subtypes) for proper impedance match against drugs and the stratification of patients into subgroups that respond to environmental challenges in a similar manner (e.g. response to drugs, response to toxins, etc.). The convergence of patient-activated social networks, big data and their analytics, and systems medicine has led to a P4 medicine that is predictive, preventive, personalized, and participatory. Medicine will focus on each individual. It will become proactive in nature. It will increasingly focus on wellness rather than disease. For example, in 10 years each patient will be surrounded by a virtual cloud of billions of data points, and we will have the tools to reduce this enormous data dimensionality into simple hypotheses about how to optimize wellness and avoid disease for each individual. P4 medicine will be able to detect and treat perturbations in healthy individuals long before disease symptoms appear, thus optimizing the wellness of individuals and avoiding disease. P4 medicine will 1) improve health care, 2) reduce the cost of health care, and 3) stimulate innovation and new company creation. Health care is not the only subject that can benefit from such integrative, cross-disciplinary, and systems-driven platforms and cultures. Many other challenges plaguing our planet, such as energy, environment, nutrition, and agriculture can be transformed by using such an integrated and systems-driven approach.
PMCID: PMC3678833  PMID: 23908862
P4 medicine; systems medicine; systems biology; personalized medicine; disease stratification; patient stratification; systems-driven diagnostics
23.  N-Glycoproteome of E14.Tg2a Mouse Embryonic Stem Cells 
PLoS ONE  2013;8(2):e55722.
E14.Tg2a mouse embryonic stem (mES) cells are a widely used host in gene trap and gene targeting techniques. Molecular characterization of host cells will provide background information for a better understanding of functions of the knockout genes. Using a highly selective glycopeptide-capture approach but ordinary liquid chromatography coupled mass spectrometry (LC-MS), we characterized the N-glycoproteins of E14.Tg2a cells and analyzed the close relationship between the obtained N-glycoproteome and cell-surface proteomes. Our results provide a global view of cell surface protein molecular properties, in which receptors seem to be much more diverse but lower in abundance than transporters on average. In addition, our results provide a systematic view of the E14.Tg2a N-glycosylation, from which we discovered some striking patterns, including an evolutionarily preserved and maybe functionally selected complementarity between N-glycosylation and the transmembrane structure in protein sequences. We also observed an environmentally influenced N-glycosylation pattern among glycoenzymes and extracellular matrix proteins. We hope that the acquired information enhances our molecular understanding of mES E14.Tg2a as well as the biological roles played by N-glycosylation in cell biology in general.
PMCID: PMC3565968  PMID: 23405203
24.  A Systems Approach to Rheumatoid Arthritis 
PLoS ONE  2012;7(12):e51508.
Rheumatoid arthritis (RA) is a chronic autoimmune disease that primarily attacks synovial joints. Despite the advances in diagnosis and treatment of RA, novel molecular targets are still needed to improve the accuracy of diagnosis and the therapeutic outcomes. Here, we present a systems approach that can effectively 1) identify core RA-associated genes (RAGs), 2) reconstruct RA-perturbed networks, and 3) select potential targets for diagnosis and treatments of RA. By integrating multiple gene expression datasets previously reported, we first identified 983 core RAGs that show RA dominant differential expression, compared to osteoarthritis (OA), in the multiple datasets. Using the core RAGs, we then reconstructed RA-perturbed networks that delineate key RA associated cellular processes and transcriptional regulation. The networks revealed that synovial fibroblasts play major roles in defining RA-perturbed processes, anti-TNF-α therapy restored many RA-perturbed processes, and 19 transcription factors (TFs) have major contribution to deregulation of the core RAGs in the RA-perturbed networks. Finally, we selected a list of potential molecular targets that can act as metrics or modulators of the RA-perturbed networks. Therefore, these network models identify a panel of potential targets that will serve as an important resource for the discovery of therapeutic targets and diagnostic markers, as well as providing novel insights into RA pathogenesis.
PMCID: PMC3519858  PMID: 23240033
25.  Kaviar: an accessible system for testing SNV novelty 
Bioinformatics  2011;27(22):3216-3217.
Summary: With the rapidly expanding availability of data from personal genomes, exomes and transcriptomes, medical researchers will frequently need to test whether observed genomic variants are novel or known. This task requires downloading and handling large and diverse datasets from a variety of sources, and processing them with bioinformatics tools and pipelines. Alternatively, researchers can upload data to online tools, which may conflict with privacy requirements. We present here Kaviar, a tool that greatly simplifies the assessment of novel variants. Kaviar includes: (i) an integrated and growing database of genomic variation from diverse sources, including over 55 million variants from personal genomes, family genomes, transcriptomes, SNV databases and population surveys; and (ii) software for querying the database efficiently.
Availability: Kaviar is programmed in Perl and offered free of charge as Open Source Software. Kaviar may be used online as a programmatic web service or downloaded for local use from The database is also provided.
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3208392  PMID: 21965822

Results 1-25 (69)