Search tips
Search criteria

Results 1-25 (63)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Realistic artificial DNA sequences as negative controls for computational genomics 
Nucleic Acids Research  2014;42(12):e99.
A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at
PMCID: PMC4081056  PMID: 24803667
2.  New and improved proteomics technologies for understanding complex biological systems: Addressing a grand challenge in the life sciences 
Proteomics  2012;12(18):2773-2783.
This White Paper sets out a Life Sciences Grand Challenge for Proteomics Technologies to enhance our understanding of complex biological systems, link genomes with phenotypes, and bring broad benefits to the biosciences and the US economy. The paper is based on a workshop hosted by the National Institute of Standards and Technology (NIST) in Gaithersburg, MD, 14–15 February 2011, with participants from many federal R&D agencies and research communities, under the aegis of the US National Science and Technology Council (NSTC). Opportunities are identified for a coordinated R&D effort to achieve major technology-based goals and address societal challenges in health, agriculture, nutrition, energy, environment, national security, and economic development.
PMCID: PMC4005326  PMID: 22807061
Complex systems; Democratization of proteomics; Economic growth; Grand challenges; Integration; Systems biology
3.  Systems Cancer Medicine: Towards Realization of Predictive, Preventive, Personalized, and Participatory (P4) Medicine 
Journal of internal medicine  2012;271(2):111-121.
A grand challenge impeding optimal treatment outcomes for cancer patients arises from the complex nature of the disease: the cellular heterogeneity, the myriad of dysfunctional molecular and genetic networks as results of genetic (somatic) and environmental perturbations. Systems biology, with its holistic approach to understanding fundamental principles in biology, and the empowering technologies in genomics, proteomics, single-cell analysis, microfluidics, and computational strategies, enables a comprehensive approach to medicine, which strives to unveil the pathogenic mechanisms of diseases, identify disease biomarkers and begin thinking about new strategies for drug target discovery. The integration of multi-dimensional high throughput “omics” measurements from tumor tissues and corresponding blood specimens, together with new systems strategies for diagnostics, enables the identification of cancer biomarkers that will enable presymptomatic diagnosis, stratification of disease, assessment of disease progression, evaluation of patient response to therapy, and the identification of reoccurrences. While some aspects of systems medicine are being adopted in clinical oncology practice through companion molecular diagnostics for personalized therapy, the mounting influx of global quantitative data from both wellness and diseases, is shaping up a transformational paradigm in medicine we termed predictive, preventive, personalized, and participatory (P4) medicine, which requires new strategies, both scientific and organizational, to enable bringing this revolution in medicine to patients and to the healthcare system. P4 medicine will have a profound impact on society—transforming the healthcare system, turning around the ever escalating costs of healthcare, digitizing the practice of medicine and creating enormous economic opportunities for those organizations and nations that embrace this revolution
PMCID: PMC3978383  PMID: 22142401
Systems medicine; cancer complexity; quantized cell populations; blood biomarkers; molecular diagnostics; P4 medicine
4.  Revolutionizing medicine in the 21st century through systems approaches 
Biotechnology journal  2012;7(8):992-1001.
Personalized medicine is a term for a revolution in medicine that envisions the individual patient as the central focus of healthcare in the future. The term “personalized medicine”, however, fails to reflect the enormous dimensionality of this new medicine that will be predictive, preventive, personalized, and participatory – a vision of medicine we have termed P4 medicine. This reflects a paradigm change in how medicine will be practiced that is revolutionary rather than evolutionary. P4 medicine arises from the confluence of a systems approach to medicine and from the digitalization of medicine that creates the large data sets necessary to deal with the complexities of disease. We predict that systems approaches will empower the transition from conventional reactive medical practice to a more proactive P4 medicine focused on wellness, and will reverse the escalating costs of drug development and will have enormous social and economic benefits. Our vision for P4 medicine in 10 years is that each patient will be associated with a virtual data cloud of billions of data points and that we will have the information technology for healthcare to reduce this enormous data dimensionality to simple hypotheses about health and/or disease for each individual. These data will be multi-scale across all levels of biological organization and extremely heterogeneous in type – this enormous amount of data represents a striking signal-to-noise (S/N) challenge. The key to dealing with this S/N challenge is to take a “holistic systems approach” to disease as we will discuss in this article.
PMCID: PMC3962497  PMID: 22815171
Functional genomics; Network biology; Personalized medicine; Systems medicine
5.  Relationship Estimation from Whole-Genome Sequence Data 
PLoS Genetics  2014;10(1):e1004144.
The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1st through 6th degree relationships, and 55% of 9th through 11th degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1st through 9th degree relationships from whole-genome sequence data.
Author Summary
The determination of the relationship between a pair of individuals is a fundamental application of genetics. The most accurate methods for relationship estimation rely on precise, localized estimates of genetic sharing between individuals. Earlier methods have generated these estimates from high-density genetic marker data. We performed relationship estimation using whole-genome sequence data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. Our results demonstrate that complexities specific to whole-genome sequencing result in regions of the genome that are prone to false-positive estimates of genetic sharing. We provide a map of these spurious IBD regions and introduce new methods, implemented in the software package ERSA 2.0, to control for spurious IBD. We show that ERSA 2.0 provides a 5% to 15% increase in relationship detection power for distant relationships with whole-genome sequence data relative to high-density genetic marker data.
PMCID: PMC3907355  PMID: 24497848
6.  Correction: Optimal Scaling of Digital Transcriptomes 
PLoS ONE  2014;9(1):10.1371/annotation/8b05a9ab-c8ad-4276-a851-1e265055fb65.
PMCID: PMC3885755
7.  Quantitative Liver-Specific Protein Fingerprint in Blood: A Signature for Hepatotoxicity 
Theranostics  2014;4(2):215-228.
We discuss here a new approach to detecting hepatotoxicity by employing concentration changes of liver-specific blood proteins during disease progression. These proteins are capable of assessing the behaviors of their cognate liver biological networks for toxicity or disease perturbations. Blood biomarkers are highly desirable diagnostics as blood is easily accessible and baths virtually all organs. Fifteen liver-specific blood proteins were identified as markers of acetaminophen (APAP)-induced hepatotoxicity using three proteomic technologies: label-free antibody microarrays, quantitative immunoblotting, and targeted iTRAQ mass spectrometry. Liver-specific blood proteins produced a toxicity signature of eleven elevated and four attenuated blood protein levels. These blood protein perturbations begin to provide a systems view of key mechanistic features of APAP-induced liver injury relating to glutathione and S-adenosyl-L-methionine (SAMe) depletion, mitochondrial dysfunction, and liver responses to the stress. Two markers, elevated membrane-bound catechol-O-methyltransferase (MB-COMT) and attenuated retinol binding protein 4 (RBP4), report hepatic injury significantly earlier than the current gold standard liver biomarker, alanine transaminase (ALT). These biomarkers were perturbed prior to onset of irreversible liver injury. Ideal markers should be applicable for both rodent model studies and human clinical trials. Five of these mouse liver-specific blood markers had human orthologs that were also found to be responsive to human hepatotoxicity. This panel of liver-specific proteins has the potential to effectively identify the early toxicity onset, the nature and extent of liver injury and report on some of the APAP-perturbed liver networks.
PMCID: PMC3900804  PMID: 24465277
liver injury; toxicity; biomarker; RBP4; COMT; CPS1; BHMT.
8.  Participatory medicine: a driving force for revolutionizing healthcare 
Genome Medicine  2013;5(12):110.
PMCID: PMC3978637  PMID: 24360023
9.  Systems Approaches to Biology and Disease Enable Translational Systems Medicine 
Genomics, proteomics & bioinformatics  2012;10(4):10.1016/j.gpb.2012.08.004.
The development and application of systems strategies to biology and disease are transforming medical research and clinical practice in an unprecedented rate. In the foreseeable future, clinicians, medical researchers, and ultimately the consumers and patients will be increasingly equipped with a deluge of personal health information, e.g., whole genome sequences, molecular profiling of diseased tissues, and periodic multi-analyte blood testing of biomarker panels for disease and wellness. The convergence of these practices will enable accurate prediction of disease susceptibility and early diagnosis for actionable preventive schema and personalized treatment regimes tailored to each individual. It will also entail proactive participation from all major stakeholders in the health care system. We are at the dawn of predictive, preventive, personalized, and participatory (P4) medicine, the fully implementation of which requires marrying basic and clinical researches through advanced systems thinking and the employment of high-throughput technologies in genomics, proteomics, nanofluidics, single-cell analysis, and computation strategies in a highly-orchestrated discipline we termed translational systems medicine.
PMCID: PMC3844613  PMID: 23084773
Systems biology; P4 Medicine; Family genome sequencing; Targeted proteomics; Single-cell analysis
10.  Optimal Scaling of Digital Transcriptomes 
PLoS ONE  2013;8(11):e77885.
Deep sequencing of transcriptomes has become an indispensable tool for biology, enabling expression levels for thousands of genes to be compared across multiple samples. Since transcript counts scale with sequencing depth, counts from different samples must be normalized to a common scale prior to comparison. We analyzed fifteen existing and novel algorithms for normalizing transcript counts, and evaluated the effectiveness of the resulting normalizations. For this purpose we defined two novel and mutually independent metrics: (1) the number of “uniform” genes (genes whose normalized expression levels have a sufficiently low coefficient of variation), and (2) low Spearman correlation between normalized expression profiles of gene pairs. We also define four novel algorithms, one of which explicitly maximizes the number of uniform genes, and compared the performance of all fifteen algorithms. The two most commonly used methods (scaling to a fixed total value, or equalizing the expression of certain ‘housekeeping’ genes) yielded particularly poor results, surpassed even by normalization based on randomly selected gene sets. Conversely, seven of the algorithms approached what appears to be optimal normalization. Three of these algorithms rely on the identification of “ubiquitous” genes: genes expressed in all the samples studied, but never at very high or very low levels. We demonstrate that these include a “core” of genes expressed in many tissues in a mutually consistent pattern, which is suitable for use as an internal normalization guide. The new methods yield robustly normalized expression values, which is a prerequisite for the identification of differentially expressed and tissue-specific genes as potential biomarkers.
PMCID: PMC3819321  PMID: 24223126
11.  SRM Targeted Proteomics in Search for Biomarkers of HCV-Induced Progression of Fibrosis to Cirrhosis in HALT-C Patients 
Proteomics  2012;12(8):1244-1252.
The current gold standard for diagnosis of hepatic fibrosis and cirrhosis is the traditional invasive liver biopsy. It is desirable to assess hepatic fibrosis with noninvasive means. Targeted proteomic techniques allow an unbiased assessment of proteins and might be useful to identify proteins related to hepatic fibrosis. We utilized Selected Reaction Monitoring (SRM) targeted proteomics combined with an organ-specific blood protein strategy to identify and quantify 38 liver-specific proteins. A combination of protein C and retinol binding protein 4 in serum gave promising preliminary results as candidate biomarkers to distinguish patients at different stages of hepatic fibrosis due to chronic infection with hepatitis C virus (HCV). Also, alpha-1-B glycoprotein, complement factor H and insulin-like growth factor binding protein acid labile subunit performed well in distinguishing patients from healthy controls.
PMCID: PMC3766736  PMID: 22577025
hepatitis C; fibrosis; liver-specific blood biomarkers; quantitation; selected reaction monitoring
12.  Multi-study Integration of Brain Cancer Transcriptomes Reveals Organ-Level Molecular Signatures 
PLoS Computational Biology  2013;9(7):e1003148.
We utilized abundant transcriptomic data for the primary classes of brain cancers to study the feasibility of separating all of these diseases simultaneously based on molecular data alone. These signatures were based on a new method reported herein – Identification of Structured Signatures and Classifiers (ISSAC) – that resulted in a brain cancer marker panel of 44 unique genes. Many of these genes have established relevance to the brain cancers examined herein, with others having known roles in cancer biology. Analyses on large-scale data from multiple sources must deal with significant challenges associated with heterogeneity between different published studies, for it was observed that the variation among individual studies often had a larger effect on the transcriptome than did phenotype differences, as is typical. For this reason, we restricted ourselves to studying only cases where we had at least two independent studies performed for each phenotype, and also reprocessed all the raw data from the studies using a unified pre-processing pipeline. We found that learning signatures across multiple datasets greatly enhanced reproducibility and accuracy in predictive performance on truly independent validation sets, even when keeping the size of the training set the same. This was most likely due to the meta-signature encompassing more of the heterogeneity across different sources and conditions, while amplifying signal from the repeated global characteristics of the phenotype. When molecular signatures of brain cancers were constructed from all currently available microarray data, 90% phenotype prediction accuracy, or the accuracy of identifying a particular brain cancer from the background of all phenotypes, was found. Looking forward, we discuss our approach in the context of the eventual development of organ-specific molecular signatures from peripheral fluids such as the blood.
Author Summary
From a multi-study, integrated transcriptomic dataset, we identified a marker panel for differentiating major human brain cancers at the gene-expression level. The ISSAC molecular signatures for brain cancers, composed of 44 unique genes, are based on comparing expression levels of pairs of genes, and phenotype prediction follows a diagnostic hierarchy. We found that sufficient dataset integration across multiple studies greatly enhanced diagnostic performance on truly independent validation sets, whereas signatures learned from only one dataset typically led to high error rate. Molecular signatures of brain cancers, when obtained using all currently available gene-expression data, achieved 90% phenotype prediction accuracy. Thus, our integrative approach holds significant promise for developing organ-level, comprehensive, molecular signatures of disease.
PMCID: PMC3723500  PMID: 23935471
13.  Nanomedicine Targets CANCER 
Scientific American  2009;300(2):44-51.
PMCID: PMC3700418  PMID: 19186705
14.  Integration of biological networks and gene expression data using Cytoscape 
Nature protocols  2007;2(10):2366-2382.
Cytoscape is a free software package for visualizing, modeling and analyzing molecular and genetic interaction networks. This protocol explains how to use Cytoscape to analyze the results of mRNA expression profiling, and other functional genomics and proteomics experiments, in the context of an interaction network obtained for genes of interest. Five major steps are described: (i) obtaining a gene or protein network, (ii) displaying the network using layout algorithms, (iii) integrating with gene expression and other functional attributes, (iv) identifying putative complexes and functional modules and (v) identifying enriched Gene Ontology annotations in the network. These steps provide a broad sample of the types of analyses performed by Cytoscape.
PMCID: PMC3685583  PMID: 17947979
15.  A Review of Computational Tools in microRNA Discovery 
Since microRNAs (miRNAs) were discovered, their impact on regulating various biological activities has been a surprising and exciting field. Knowing the entire repertoire of these small molecules is the first step to gain a better understanding of their function. High throughput discovery tools such as next-generation sequencing significantly increased the number of known miRNAs in different organisms in recent years. However, the process of being able to accurately identify miRNAs is still a complex and difficult task, requiring the integration of experimental approaches with computational methods. A number of prediction algorithms based on characteristics of miRNA molecules have been developed to identify new miRNA species. Different approaches have certain strengths and weaknesses and in this review, we aim to summarize several commonly used tools in metazoan miRNA discovery.
PMCID: PMC3654206  PMID: 23720668
isomer; machine learning; miRNA conservation; RNA secondary structure; sequence homology
16.  Systems Biology and P4 Medicine: Past, Present, and Future 
Studying complex biological systems in a holistic rather than a “one gene or one protein” at a time approach requires the concerted effort of scientists from a wide variety of disciplines. The Institute for Systems Biology (ISB) has seamlessly integrated these disparate fields to create a cross-disciplinary platform and culture in which “biology drives technology drives computation.” To achieve this platform/culture, it has been necessary for cross-disciplinary ISB scientists to learn one another’s languages and work together effectively in teams. The focus of this “systems” approach on disease has led to a discipline denoted systems medicine. The advent of technological breakthroughs in the fields of genomics, proteomics, and, indeed, the other “omics” is catalyzing striking advances in systems medicine that have and are transforming diagnostic and therapeutic strategies. Systems medicine has united genomics and genetics through family genomics to more readily identify disease genes. It has made blood a window into health and disease. It is leading to the stratification of diseases (division into discrete subtypes) for proper impedance match against drugs and the stratification of patients into subgroups that respond to environmental challenges in a similar manner (e.g. response to drugs, response to toxins, etc.). The convergence of patient-activated social networks, big data and their analytics, and systems medicine has led to a P4 medicine that is predictive, preventive, personalized, and participatory. Medicine will focus on each individual. It will become proactive in nature. It will increasingly focus on wellness rather than disease. For example, in 10 years each patient will be surrounded by a virtual cloud of billions of data points, and we will have the tools to reduce this enormous data dimensionality into simple hypotheses about how to optimize wellness and avoid disease for each individual. P4 medicine will be able to detect and treat perturbations in healthy individuals long before disease symptoms appear, thus optimizing the wellness of individuals and avoiding disease. P4 medicine will 1) improve health care, 2) reduce the cost of health care, and 3) stimulate innovation and new company creation. Health care is not the only subject that can benefit from such integrative, cross-disciplinary, and systems-driven platforms and cultures. Many other challenges plaguing our planet, such as energy, environment, nutrition, and agriculture can be transformed by using such an integrated and systems-driven approach.
PMCID: PMC3678833  PMID: 23908862
P4 medicine; systems medicine; systems biology; personalized medicine; disease stratification; patient stratification; systems-driven diagnostics
17.  N-Glycoproteome of E14.Tg2a Mouse Embryonic Stem Cells 
PLoS ONE  2013;8(2):e55722.
E14.Tg2a mouse embryonic stem (mES) cells are a widely used host in gene trap and gene targeting techniques. Molecular characterization of host cells will provide background information for a better understanding of functions of the knockout genes. Using a highly selective glycopeptide-capture approach but ordinary liquid chromatography coupled mass spectrometry (LC-MS), we characterized the N-glycoproteins of E14.Tg2a cells and analyzed the close relationship between the obtained N-glycoproteome and cell-surface proteomes. Our results provide a global view of cell surface protein molecular properties, in which receptors seem to be much more diverse but lower in abundance than transporters on average. In addition, our results provide a systematic view of the E14.Tg2a N-glycosylation, from which we discovered some striking patterns, including an evolutionarily preserved and maybe functionally selected complementarity between N-glycosylation and the transmembrane structure in protein sequences. We also observed an environmentally influenced N-glycosylation pattern among glycoenzymes and extracellular matrix proteins. We hope that the acquired information enhances our molecular understanding of mES E14.Tg2a as well as the biological roles played by N-glycosylation in cell biology in general.
PMCID: PMC3565968  PMID: 23405203
18.  A Systems Approach to Rheumatoid Arthritis 
PLoS ONE  2012;7(12):e51508.
Rheumatoid arthritis (RA) is a chronic autoimmune disease that primarily attacks synovial joints. Despite the advances in diagnosis and treatment of RA, novel molecular targets are still needed to improve the accuracy of diagnosis and the therapeutic outcomes. Here, we present a systems approach that can effectively 1) identify core RA-associated genes (RAGs), 2) reconstruct RA-perturbed networks, and 3) select potential targets for diagnosis and treatments of RA. By integrating multiple gene expression datasets previously reported, we first identified 983 core RAGs that show RA dominant differential expression, compared to osteoarthritis (OA), in the multiple datasets. Using the core RAGs, we then reconstructed RA-perturbed networks that delineate key RA associated cellular processes and transcriptional regulation. The networks revealed that synovial fibroblasts play major roles in defining RA-perturbed processes, anti-TNF-α therapy restored many RA-perturbed processes, and 19 transcription factors (TFs) have major contribution to deregulation of the core RAGs in the RA-perturbed networks. Finally, we selected a list of potential molecular targets that can act as metrics or modulators of the RA-perturbed networks. Therefore, these network models identify a panel of potential targets that will serve as an important resource for the discovery of therapeutic targets and diagnostic markers, as well as providing novel insights into RA pathogenesis.
PMCID: PMC3519858  PMID: 23240033
19.  Kaviar: an accessible system for testing SNV novelty 
Bioinformatics  2011;27(22):3216-3217.
Summary: With the rapidly expanding availability of data from personal genomes, exomes and transcriptomes, medical researchers will frequently need to test whether observed genomic variants are novel or known. This task requires downloading and handling large and diverse datasets from a variety of sources, and processing them with bioinformatics tools and pipelines. Alternatively, researchers can upload data to online tools, which may conflict with privacy requirements. We present here Kaviar, a tool that greatly simplifies the assessment of novel variants. Kaviar includes: (i) an integrated and growing database of genomic variation from diverse sources, including over 55 million variants from personal genomes, family genomes, transcriptomes, SNV databases and population surveys; and (ii) software for querying the database efficiently.
Availability: Kaviar is programmed in Perl and offered free of charge as Open Source Software. Kaviar may be used online as a programmatic web service or downloaded for local use from The database is also provided.
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3208392  PMID: 21965822
20.  Silencing SOX2 Induced Mesenchymal-Epithelial Transition and Its Expression Predicts Liver and Lymph Node Metastasis of CRC Patients 
PLoS ONE  2012;7(8):e41335.
SOX2 is an important stem cell marker and plays important roles in development and carcinogenesis. However, the role of SOX2 in Epithelial-Mesenchymal Transition has not been investigated. We demonstrated, for the first time, that SOX2 is involved in the Epithelial-Mesenchymal Transition (EMT) process as knock downof SOX2 in colorectal cancer (CRC) SW620 cells induced a Mesenchymal-Epithelial Transition (MET) process with recognized changes in the expression of key genes involved in the EMT process including E-cadherin and vimentin. In addition, we provided a link between SOX2 activity and the WNT pathway by showing that knock down of SOX2 reduced the WNT pathway activity in colorectal cancer (CRC) cells. We further demonstrated that SOX2 is involved in cell migration and invasion in vitro and in metastasis in vivo for CRC cells, and that the process might be mediated through the MMP2 activity. Finally, an IHC analysis of 44 cases of colorectal cancer patients suggested that SOX2 is a prognosis marker for metastasis of colorectal cancers.
PMCID: PMC3422347  PMID: 22912670
21.  In complex biology, prior knowledge is power 
Cell  2011;144(6):860-863.
Complexity is the grand challenge for science and engineering in the 21st century. We suggest that biology is a discipline that is uniquely situated to tackle complexity, through a diverse array of technologies for characterizing molecular structure, interactions and function. A major difficulty in the analysis of complex biological systems is dealing with the low signal-to-noise inherent to nearly all large-scale biological data sets. We discuss powerful bioinformatic concepts for boosting signal-to-noise through external knowledge incorporated in processing units we call Filters and Integrators. These concepts are illustrated in four landmark studies that have provided model implementations of Filters, Integrators, or both.
PMCID: PMC3102020  PMID: 21414478
22.  Principal network analysis: identification of subnetworks representing major dynamics using gene expression data 
Bioinformatics  2010;27(3):391-398.
Motivation: Systems biology attempts to describe complex systems behaviors in terms of dynamic operations of biological networks. However, there is lack of tools that can effectively decode complex network dynamics over multiple conditions.
Results: We present principal network analysis (PNA) that can automatically capture major dynamic activation patterns over multiple conditions and then generate protein and metabolic subnetworks for the captured patterns. We first demonstrated the utility of this method by applying it to a synthetic dataset. The results showed that PNA correctly captured the subnetworks representing dynamics in the data. We further applied PNA to two time-course gene expression profiles collected from (i) MCF7 cells after treatments of HRG at multiple doses and (ii) brain samples of four strains of mice infected with two prion strains. The resulting subnetworks and their interactions revealed network dynamics associated with HRG dose-dependent regulation of cell proliferation and differentiation and early PrPSc accumulation during prion infection.
Availability: The web-based software is available at:
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3031040  PMID: 21193522
23.  RNASEQR—a streamlined and accurate RNA-seq sequence analysis program 
Nucleic Acids Research  2011;40(6):e42.
Next-generation sequencing (NGS) technologies-based transcriptomic profiling method often called RNA-seq has been widely used to study global gene expression, alternative exon usage, new exon discovery, novel transcriptional isoforms and genomic sequence variations. However, this technique also poses many biological and informatics challenges to extracting meaningful biological information. The RNA-seq data analysis is built on the foundation of high quality initial genome localization and alignment information for RNA-seq sequences. Toward this goal, we have developed RNASEQR to accurately and effectively map millions of RNA-seq sequences. We have systematically compared RNASEQR with four of the most widely used tools using a simulated data set created from the Consensus CDS project and two experimental RNA-seq data sets generated from a human glioblastoma patient. Our results showed that RNASEQR yields more accurate estimates for gene expression, complete gene structures and new transcript isoforms, as well as more accurate detection of single nucleotide variants (SNVs). RNASEQR analyzes raw data from RNA-seq experiments effectively and outputs results in a manner that is compatible with a wide variety of specialized downstream analyses on desktop computers.
PMCID: PMC3315322  PMID: 22199257
24.  Extracellular microRNA: a new source of biomarkers 
Mutation research  2011;717(1-2):85-90.
MicroRNAs (miRNAs) are a recently discovered class of small, non-coding RNAs that regulate protein levels post-transcriptionally. miRNAs play important regulatory roles in many cellular processes, including differentiation, neoplastic transformation, and cell replication and regeneration. Because of these regulatory roles, it is not surprising that aberrant miRNA expression has been implicated in several diseases. Recent studies have reported significant levels of miRNAs in serum and other body fluids, raising the possibility that circulating miRNAs could serve as useful clinical biomarkers. Here, we provide a brief overview of miRNA biogenesis and function, the identification and potential roles of circulating extracellular miRNAs, and the prospective uses of miRNAs as clinical biomarkers. Finally, we address several issues associated with the accurate measurement of miRNAs from biological samples.
PMCID: PMC3199035  PMID: 21402084
25.  Down-Regulation of Shadoo in Prion Infections Traces a Pre-Clinical Event Inversely Related to PrPSc Accumulation 
PLoS Pathogens  2011;7(11):e1002391.
During prion infections of the central nervous system (CNS) the cellular prion protein, PrPC, is templated to a conformationally distinct form, PrPSc. Recent studies have demonstrated that the Sprn gene encodes a GPI-linked glycoprotein Shadoo (Sho), which localizes to a similar membrane environment as PrPC and is reduced in the brains of rodents with terminal prion disease. Here, analyses of prion-infected mice revealed that down-regulation of Sho protein was not related to Sprn mRNA abundance at any stage in prion infection. Down-regulation was robust upon propagation of a variety of prion strains in Prnpa and Prnpb mice, with the exception of the mouse-adapted BSE strain 301 V. In addition, Sho encoded by a TgSprn transgene was down-regulated to the same extent as endogenous Sho. Reduced Sho levels were not seen in a tauopathy, in chemically induced spongiform degeneration or in transgenic mice expressing the extracellular ADan amyloid peptide of familial Danish dementia. Insofar as prion-infected Prnp hemizygous mice exhibited accumulation of PrPSc and down-regulation of Sho hundreds of days prior to onset of neurologic symptoms, Sho depletion can be excluded as an important trigger for clinical disease or as a simple consequence of neuronal damage. These studies instead define a disease-specific effect, and we hypothesize that membrane-associated Sho comprises a bystander substrate for processes degrading PrPSc. Thus, while protease-resistant PrP detected by in vitro digestion allows post mortem diagnosis, decreased levels of endogenous Sho may trace an early response to PrPSc accumulation that operates in the CNS in vivo. This cellular response may offer new insights into the homeostatic mechanisms involved in detection and clearance of the misfolded proteins that drive prion disease pathogenesis.
Author Summary
In prion infections of the nervous system the cellular prion protein, PrPC, changes to a distinct form, PrPSc. Recent studies have demonstrated that another glycoprotein Shadoo (Sho), which occupies a similar membrane environment as PrPC, is reduced in the brains of rodents with terminal prion disease. Our analyses of prion-infected mice revealed that reduction of Sho protein was not due to reductions in the corresponding messenger RNA. Reduction in Sho was clearly evident upon propagation of a variety of prion strains, but was not seen in mice with other types of neurodegenerative disease. Also, as prion-infected mice with only one copy of the PrP gene exhibited both accumulation of PrPSc and a reduction of Sho protein hundreds of days prior to onset of neurologic symptoms, the drop in Sho protein level can be excluded as an important trigger for clinical disease, or a non-specific consequence of brain cell damage. Instead, our studies define a effect restricted to prion disease and we hypothesize that Sho protein is a “bystander” for degradative processes aimed at destroying PrPSc.
PMCID: PMC3219720  PMID: 22114562

Results 1-25 (63)