Search tips
Search criteria

Results 1-25 (1268862)

Clipboard (0)

Related Articles

1.  GPX-Macrophage Expression Atlas: A database for expression profiles of macrophages challenged with a variety of pro-inflammatory, anti-inflammatory, benign and pathogen insults 
BMC Genomics  2005;6:178.
Macrophages play an integral role in the host immune system, bridging innate and adaptive immunity. As such, they are finely attuned to extracellular and intracellular stimuli and respond by rapidly initiating multiple signalling cascades with diverse effector functions. The macrophage cell is therefore an experimentally and clinically amenable biological system for the mapping of biological pathways. The goal of the macrophage expression atlas is to systematically investigate the pathway biology and interaction network of macrophages challenged with a variety of insults, in particular via infection and activation with key inflammatory mediators. As an important first step towards this we present a single searchable database resource containing high-throughput macrophage gene expression studies.
The GPX Macrophage Expression Atlas (GPX-MEA) is an online resource for gene expression based studies of a range of macrophage cell types following treatment with pathogens and immune modulators. GPX-MEA follows the MIAME standard and includes an objective quality score with each experiment. It places special emphasis on rigorously capturing the experimental design and enables the searching of expression data from different microarray experiments. Studies may be queried on the basis of experimental parameters, sample information and quality assessment score. The ability to compare the expression values of individual genes across multiple experiments is provided. In addition, the database offers access to experimental annotation and analysis files and includes experiments and raw data previously unavailable to the research community.
GPX-MEA is the first example of a quality scored gene expression database focussed on a macrophage cellular system that allows efficient identification of transcriptional patterns. The resource will provide novel insights into the phenotypic response of macrophages to a variety of benign, inflammatory, and pathogen insults. GPX-MEA is available through the GPX website at .
PMCID: PMC1351201  PMID: 16343346
2.  Global Functional Atlas of Escherichia coli Encompassing Previously Uncharacterized Proteins 
PLoS Biology  2009;7(4):e1000096.
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
Author Summary
One goal of modern biology is to chart groups of proteins that act together to perform biological processes via direct and indirect interactions. Such groupings are sometimes called functional modules. The types of protein interactions within modules include physical interactions that generate protein complexes and biochemical associations that make up metabolic pathways. We have combined proteomic and bioinformatic tools, and used them to decipher a large number of protein interactions, complexes, and functional modules with high confidence. In addition, exploring the topology of the resulting interaction networks, we successfully predicted specific biological roles for a number of proteins with previously unknown functions, and identified some potential drug targets. Although our work is focused on E. coli, our phylogenetic projections suggest that a considerable fraction of our observations and predictions can be extrapolated to many other bacterial taxa. As all the data derived from this study are publicly available, others may build on our work for further hypothesis-driven studies of gene function discovery.
A novel resource integrating proteomic and genome context-based tools provides a "systems-wide" functional blueprint ofE. coli, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
PMCID: PMC2672614  PMID: 19402753
3.  The Biological Connection Markup Language: a SBGN-compliant format for visualization, filtering and analysis of biological pathways 
Bioinformatics  2011;27(15):2127-2133.
Motivation: Many models and analysis of signaling pathways have been proposed. However, neither of them takes into account that a biological pathway is not a fixed system, but instead it depends on the organism, tissue and cell type as well as on physiological, pathological and experimental conditions.
Results: The Biological Connection Markup Language (BCML) is a format to describe, annotate and visualize pathways. BCML is able to store multiple information, permitting a selective view of the pathway as it exists and/or behave in specific organisms, tissues and cells. Furthermore, BCML can be automatically converted into data formats suitable for analysis and into a fully SBGN-compliant graphical representation, making it an important tool that can be used by both computational biologists and ‘wet lab’ scientists.
Availability and implementation: The XML schema and the BCML software suite are freely available under the LGPL for download at They are implemented in Java and supported on MS Windows, Linux and OS X.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3137220  PMID: 21653523
4.  An expression atlas of human primary cells: inference of gene function from coexpression networks 
BMC Genomics  2013;14:632.
The specialisation of mammalian cells in time and space requires genes associated with specific pathways and functions to be co-ordinately expressed. Here we have combined a large number of publically available microarray datasets derived from human primary cells and analysed large correlation graphs of these data.
Using the network analysis tool BioLayout Express3D we identify robust co-associations of genes expressed in a wide variety of cell lineages. We discuss the biological significance of a number of these associations, in particular the coexpression of key transcription factors with the genes that they are likely to control.
We consider the regulation of genes in human primary cells and specifically in the human mononuclear phagocyte system. Of particular note is the fact that these data do not support the identity of putative markers of antigen-presenting dendritic cells, nor classification of M1 and M2 activation states, a current subject of debate within immunological field. We have provided this data resource on the BioGPS web site ( and on (
PMCID: PMC3849585  PMID: 24053356
Clustering; Meta-analysis; Human; Primary cells; Dendritic cell; Macrophage; Microarray; Transcriptomics
5.  A gene expression atlas of the domestic pig 
BMC Biology  2012;10:90.
This work describes the first genome-wide analysis of the transcriptional landscape of the pig. A new porcine Affymetrix expression array was designed in order to provide comprehensive coverage of the known pig transcriptome. The new array was used to generate a genome-wide expression atlas of pig tissues derived from 62 tissue/cell types. These data were subjected to network correlation analysis and clustering.
The analysis presented here provides a detailed functional clustering of the pig transcriptome where transcripts are grouped according to their expression pattern, so one can infer the function of an uncharacterized gene from the company it keeps and the locations in which it is expressed. We describe the overall transcriptional signatures present in the tissue atlas, where possible assigning those signatures to specific cell populations or pathways. In particular, we discuss the expression signatures associated with the gastrointestinal tract, an organ that was sampled at 15 sites along its length and whose biology in the pig is similar to human. We identify sets of genes that define specialized cellular compartments and region-specific digestive functions. Finally, we performed a network analysis of the transcription factors expressed in the gastrointestinal tract and demonstrate how they sub-divide into functional groups that may control cellular gastrointestinal development.
As an important livestock animal with a physiology that is more similar than mouse to man, we provide a major new resource for understanding gene expression with respect to the known physiology of mammalian tissues and cells. The data and analyses are available on the websites and
PMCID: PMC3814290  PMID: 23153189
pig; porcine; Sus scrofa; microarray; transcriptome; transcription network; pathway; gastrointestinal tract
6.  SBMLsqueezer: A CellDesigner plug-in to generate kinetic rate equations for biochemical networks 
BMC Systems Biology  2008;2:39.
The development of complex biochemical models has been facilitated through the standardization of machine-readable representations like SBML (Systems Biology Markup Language). This effort is accompanied by the ongoing development of the human-readable diagrammatic representation SBGN (Systems Biology Graphical Notation). The graphical SBML editor CellDesigner allows direct translation of SBGN into SBML, and vice versa. For the assignment of kinetic rate laws, however, this process is not straightforward, as it often requires manual assembly and specific knowledge of kinetic equations.
SBMLsqueezer facilitates exactly this modeling step via automated equation generation, overcoming the highly error-prone and cumbersome process of manually assigning kinetic equations. For each reaction the kinetic equation is derived from the stoichiometry, the participating species (e.g., proteins, mRNA or simple molecules) as well as the regulatory relations (activation, inhibition or other modulations) of the SBGN diagram. Such information allows distinctions between, for example, translation, phosphorylation or state transitions. The types of kinetics considered are numerous, for instance generalized mass-action, Hill, convenience and several Michaelis-Menten-based kinetics, each including activation and inhibition. These kinetics allow SBMLsqueezer to cover metabolic, gene regulatory, signal transduction and mixed networks. Whenever multiple kinetics are applicable to one reaction, parameter settings allow for user-defined specifications. After invoking SBMLsqueezer, the kinetic formulas are generated and assigned to the model, which can then be simulated in CellDesigner or with external ODE solvers. Furthermore, the equations can be exported to SBML, LaTeX or plain text format.
SBMLsqueezer considers the annotation of all participating reactants, products and regulators when generating rate laws for reactions. Thus, for each reaction, only applicable kinetic formulas are considered. This modeling scheme creates kinetics in accordance with the diagrammatic representation. In contrast most previously published tools have relied on the stoichiometry and generic modulators of a reaction, thus ignoring and potentially conflicting with the information expressed through the process diagram. Additional material and the source code can be found at the project homepage (URL found in the Availability and requirements section).
PMCID: PMC2412839  PMID: 18447902
7.  A High-Resolution Anatomical Atlas of the Transcriptome in the Mouse Embryo 
PLoS Biology  2011;9(1):e1000582.
The manuscript describes the “digital transcriptome atlas” of the developing mouse embryo, a powerful resource to determine co-expression of genes, to identify cell populations and lineages and to identify functional associations between genes relevant to development and disease.
Ascertaining when and where genes are expressed is of crucial importance to understanding or predicting the physiological role of genes and proteins and how they interact to form the complex networks that underlie organ development and function. It is, therefore, crucial to determine on a genome-wide level, the spatio-temporal gene expression profiles at cellular resolution. This information is provided by colorimetric RNA in situ hybridization that can elucidate expression of genes in their native context and does so at cellular resolution. We generated what is to our knowledge the first genome-wide transcriptome atlas by RNA in situ hybridization of an entire mammalian organism, the developing mouse at embryonic day 14.5. This digital transcriptome atlas, the Eurexpress atlas (, consists of a searchable database of annotated images that can be interactively viewed. We generated anatomy-based expression profiles for over 18,000 coding genes and over 400 microRNAs. We identified 1,002 tissue-specific genes that are a source of novel tissue-specific markers for 37 different anatomical structures. The quality and the resolution of the data revealed novel molecular domains for several developing structures, such as the telencephalon, a novel organization for the hypothalamus, and insight on the Wnt network involved in renal epithelial differentiation during kidney development. The digital transcriptome atlas is a powerful resource to determine co-expression of genes, to identify cell populations and lineages, and to identify functional associations between genes relevant to development and disease.
Author Summary
In situ hybridization (ISH) can be used to visualize gene expression in cells and tissues in their native context. High-throughput ISH using nonradioactive RNA probes allowed the Eurexpress consortium to generate a comprehensive, interactive, and freely accessible digital gene expression atlas, the Eurexpress transcriptome atlas (, of the E14.5 mouse embryo. Expression data for over 15,000 genes were annotated for hundreds of anatomical structures, thus allowing us to systematically identify tissue-specific and tissue-overlapping gene networks. We illustrate the value of the Eurexpress atlas by finding novel regional subdivisions in the developing brain. We also use the transcriptome atlas to allocate specific components of the complex Wnt signaling pathway to kidney development, and we identify regionally expressed genes in liver that may be markers of hematopoietic stem cell differentiation.
PMCID: PMC3022534  PMID: 21267068
8.  An atlas of gene regulatory networks reveals multiple three-gene mechanisms for interpreting morphogen gradients 
Although >450 different topologies can achieve the same multicellular patterning function, they can be grouped into six main classes, which operate using different underlying dynamics.Alternative designs for the same functions can therefore split into two types: (a) topology alterations that retain the same underlying dynamics and (b) alterations that utilize a completely different underlying dynamical mechanism.This segregation of networks into distinct dynamical mechanisms can be revealed by the shape of the topology atlas itself.Cell–cell communication is not usually part of the causal mechanism underlying a band-pass response during morphogen interpretation, but it can tune the result or increase robustness.
Understanding how gene regulatory networks (GRNs) achieve particular biological functions is a central question in systems biology. Systems biology promises to go beyond a case-by-case understanding of individual networks to map out the complete design space of mechanistic possibilities that underlie biological functions. Can such maps serve as useful theoretical frameworks in which to explore the general design principles for these functions? Towards addressing these questions, we created the first design space for a morphogen interpretation function.
In order to generate a design space for such a function, we enumerated all possible wiring designs of GRNs consisting of three genes and tested their ability to perform one particular morphogen interpretation function; stripe formation, as it represents a simplified form of the much studied French flag problem and is a commonly found gene expression pattern (Figure 1A). We found that only 5% of GRNs had the ability to generate a single stripe of gene expression when simulated with a fixed morphogen input in a one-dimensional model.
We hypothesized that the core mechanisms for producing the stripe of gene expression should be represented by topologies that contain only the necessary and sufficient gene–gene interactions for that function. Hence, we utilized the notions of complexity and neighborhood to generate a complexity atlas. GRNs of such an atlas (represented by nodes) are considered neighbors if they differ by a single gene–gene interaction (neighboring GRN nodes are connected by edges). Such a metagraph (graph of graphs) can then be reorganized using complexity (number of gene–gene interactions) to determine a GRNs position in the y axis, whereas GRNs are spaced in the x axis with the aim of reducing edge crossing (Figure 5A). This reorganization reveals a striking structure, where ‘stalactites' of complexity can be seen protruding from the bottom of the atlas. Each of these stalactites converges on a single ‘core' topology that by extensive analysis we find represents a distinct mechanism.
The mechanisms employ a diverse range of distinct space–time behaviors, and the underlying core topologies display design features such as modularity and feed-forward. We mapped the mechanisms to the complexity atlas by analyzing how each particular GRN of the atlas was working. The GRNs functioning via the different mechanisms are highlighted by the different colors in Figure 5A. Mechanisms thus occupy large regions of separated topology space, suggesting them to be discrete. Analyzing transitions between mechanisms through parameter space confirms this to be the case.
We find that three of the mechanisms are employed in real patterning systems, including both blastoderm patterning in Drosophila and mesoderm specification in Xenopus (Figure 5B). The remaining three mechanisms are thus candidates for employment in other patterning systems. We explored the performance features of these mechanisms, which suggest that some have features such as robustness to parameter variation that make them highly likely to be employed in particular patterning contexts.
Only one of the six-core mechanisms absolutely requires cell–cell communication for functionality, prompting us to predict that cell–cell communication will rarely be responsible for the basic dose response of morphogen interpretation networks. However, we show how cell–cell communication has an important role in robust stripe generation in the face of a noisy morphogen input and in fine tuning the quantitative details of stripe patterning.
In summary, the complexity atlas approach is an amendable approach to any system with a clear genotype–function relationship. We demonstrate how certain functions such as morphogen interpretation may have a range of potential solutions in contrast to previous studies that analyzed more constrained functions. Furthermore, we demonstrate how such an approach can be utilized to define a ‘design space' for a given biological function that describes the different mechanistic possibilities and how they relate to one another (Figure 5). Such a design space can be used practically as a guide to discern which patterning mechanisms are likely be at work in a particular context throwing up less intuitive possibilities with powerful performance features.
The interpretation of morphogen gradients is a pivotal concept in developmental biology, and several mechanisms have been proposed to explain how gene regulatory networks (GRNs) achieve concentration-dependent responses. However, the number of different mechanisms that may exist for cells to interpret morphogens, and the importance of design features such as feedback or local cell–cell communication, is unclear. A complete understanding of such systems will require going beyond a case-by-case analysis of real morphogen interpretation mechanisms and mapping out a complete GRN ‘design space.' Here, we generate a first atlas of design space for GRNs capable of patterning a homogeneous field of cells into discrete gene expression domains by interpreting a fixed morphogen gradient. We uncover multiple very distinct mechanisms distributed discretely across the atlas, thereby expanding the repertoire of morphogen interpretation network motifs. Analyzing this diverse collection of mechanisms also allows us to predict that local cell–cell communication will rarely be responsible for the basic dose-dependent response of morphogen interpretation networks.
PMCID: PMC3010108  PMID: 21045819
design space; gene network; morphogen; patterning; systems biology
9.  Atlas of Gene Expression in the Developing Kidney at Microanatomic Resolution 
Developmental cell  2008;15(5):781-791.
Kidney development is based on differential cell type specific expression of a vast number of genes. While multiple critical genes and pathways have been elucidated, a genomewide analysis of gene expression within individual cellular and anatomic structures is lacking. Accomplishing this could provide significant new insights into fundamental developmental mechanisms such as mesenchymal-epithelial transition, inductive signaling, branching morphogenesis and segmentation. We describe here a comprehensive gene expression atlas of the developing mouse kidney based on the isolation of each major compartment by either laser capture microdissection or fluorescent activated cell sorting, followed by microarray profiling. The resulting data agrees with known expression patterns and additional in situ hybridizations. This kidney atlas allows a comprehensive analysis of the progression of gene expression states during nephrogenesis, as well as discovery of novel growth factor-receptor interactions. In addition, the results provide deeper insight into the genetic regulatory mechanisms of kidney development.
PMCID: PMC2653061  PMID: 19000842
10.  A comprehensive map of the influenza A virus replication cycle 
BMC Systems Biology  2013;7:97.
Influenza is a common infectious disease caused by influenza viruses. Annual epidemics cause severe illnesses, deaths, and economic loss around the world. To better defend against influenza viral infection, it is essential to understand its mechanisms and associated host responses. Many studies have been conducted to elucidate these mechanisms, however, the overall picture remains incompletely understood. A systematic understanding of influenza viral infection in host cells is needed to facilitate the identification of influential host response mechanisms and potential drug targets.
We constructed a comprehensive map of the influenza A virus (‘IAV’) life cycle (‘FluMap’) by undertaking a literature-based, manual curation approach. Based on information obtained from publicly available pathway databases, updated with literature-based information and input from expert virologists and immunologists, FluMap is currently composed of 960 factors (i.e., proteins, mRNAs etc.) and 456 reactions, and is annotated with ~500 papers and curation comments. In addition to detailing the type of molecular interactions, isolate/strain specific data are also available. The FluMap was built with the pathway editor CellDesigner in standard SBML (Systems Biology Markup Language) format and visualized as an SBGN (Systems Biology Graphical Notation) diagram. It is also available as a web service (online map) based on the iPathways+ system to enable community discussion by influenza researchers. We also demonstrate computational network analyses to identify targets using the FluMap.
The FluMap is a comprehensive pathway map that can serve as a graphically presented knowledge-base and as a platform to analyze functional interactions between IAV and host factors. Publicly available webtools will allow continuous updating to ensure the most reliable representation of the host-virus interaction network. The FluMap is available at
PMCID: PMC3819658  PMID: 24088197
Drug targets; FluMap; Host factors; Influenza virus; Pathways
11.  Metabolic network reconstruction of Chlamydomonas offers insight into light-driven algal metabolism 
A comprehensive genome-scale metabolic network of Chlamydomonas reinhardtii, including a detailed account of light-driven metabolism, is reconstructed and validated. The model provides a new resource for research of C. reinhardtii metabolism and in algal biotechnology.
The genome-scale metabolic network of Chlamydomonas reinhardtii (iRC1080) was reconstructed, accounting for >32% of the estimated metabolic genes encoded in the genome, and including extensive details of lipid metabolic pathways.This is the first metabolic network to explicitly account for stoichiometry and wavelengths of metabolic photon usage, providing a new resource for research of C. reinhardtii metabolism and developments in algal biotechnology.Metabolic functional annotation and the largest transcript verification of a metabolic network to date was performed, at least partially verifying >90% of the transcripts accounted for in iRC1080. Analysis of the network supports hypotheses concerning the evolution of latent lipid pathways in C. reinhardtii, including very long-chain polyunsaturated fatty acid and ceramide synthesis pathways.A novel approach for modeling light-driven metabolism was developed that accounts for both light source intensity and spectral quality of emitted light. The constructs resulting from this approach, termed prism reactions, were shown to significantly improve the accuracy of model predictions, and their use was demonstrated for evaluation of light source efficiency and design.
Algae have garnered significant interest in recent years, especially for their potential application in biofuel production. The hallmark, model eukaryotic microalgae Chlamydomonas reinhardtii has been widely used to study photosynthesis, cell motility and phototaxis, cell wall biogenesis, and other fundamental cellular processes (Harris, 2001). Characterizing algal metabolism is key to engineering production strains and understanding photobiological phenomena. Based on extensive literature on C. reinhardtii metabolism, its genome sequence (Merchant et al, 2007), and gene functional annotation, we have reconstructed and experimentally validated the genome-scale metabolic network for this alga, iRC1080, the first network to account for detailed photon absorption permitting growth simulations under different light sources. iRC1080 accounts for 1080 genes, associated with 2190 reactions and 1068 unique metabolites and encompasses 83 subsystems distributed across 10 cellular compartments (Figure 1A). Its >32% coverage of estimated metabolic genes is a tremendous expansion over previous algal reconstructions (Boyle and Morgan, 2009; Manichaikul et al, 2009). The lipid metabolic pathways of iRC1080 are considerably expanded relative to existing networks, and chemical properties of all metabolites in these pathways are accounted for explicitly, providing sufficient detail to completely specify all individual molecular species: backbone molecule and stereochemical numbering of acyl-chain positions; acyl-chain length; and number, position, and cis–trans stereoisomerism of carbon–carbon double bonds. Such detail in lipid metabolism will be critical for model-driven metabolic engineering efforts.
We experimentally verified transcripts accounted for in the network under permissive growth conditions, detecting >90% of tested transcript models (Figure 1B) and providing validating evidence for the contents of iRC1080. We also analyzed the extent of transcript verification by specific metabolic subsystems. Some subsystems stood out as more poorly verified, including chloroplast and mitochondrial transport systems and sphingolipid metabolism, all of which exhibited <80% of transcripts detected, reflecting incomplete characterization of compartmental transporters and supporting a hypothesis of latent pathway evolution for ceramide synthesis in C. reinhardtii. Additional lines of evidence from the reconstruction effort similarly support this hypothesis including lack of ceramide synthetase and other annotation gaps downstream in sphingolipid metabolism. A similar hypothesis of latent pathway evolution was established for very long-chain fatty acids (VLCFAs) and their polyunsaturated analogs (VLCPUFAs) (Figure 1C), owing to the absence of this class of lipids in previous experimental measurements, lack of a candidate VLCFA elongase in the functional annotation, and additional downstream annotation gaps in arachidonic acid metabolism.
The network provides a detailed account of metabolic photon absorption by light-driven reactions, including photosystems I and II, light-dependent protochlorophyllide oxidoreductase, provitamin D3 photoconversion to vitamin D3, and rhodopsin photoisomerase; this network accounting permits the precise modeling of light-dependent metabolism. iRC1080 accounts for effective light spectral ranges through analysis of biochemical activity spectra (Figure 3A), either reaction activity or absorbance at varying light wavelengths. Defining effective spectral ranges associated with each photon-utilizing reaction enabled our network to model growth under different light sources via stoichiometric representation of the spectral composition of emitted light, termed prism reactions. Coefficients for different photon wavelengths in a prism reaction correspond to the ratios of photon flux in the defined effective spectral ranges to the total emitted photon flux from a given light source (Figure 3B). This approach distinguishes the amount of emitted photons that drive different metabolic reactions. We created prism reactions for most light sources that have been used in published studies for algal and plant growth including solar light, various light bulbs, and LEDs. We also included regulatory effects, resulting from lighting conditions insofar as published studies enabled. Light and dark conditions have been shown to affect metabolic enzyme activity in C. reinhardtii on multiple levels: transcriptional regulation, chloroplast RNA degradation, translational regulation, and thioredoxin-mediated enzyme regulation. Through application of our light model and prism reactions, we were able to closely recapitulate experimental growth measurements under solar, incandescent, and red LED lights. Through unbiased sampling, we were able to establish the tremendous statistical significance of the accuracy of growth predictions achievable through implementation of prism reactions. Finally, application of the photosynthetic model was demonstrated prospectively to evaluate light utilization efficiency under different light sources. The results suggest that, of the existing light sources, red LEDs provide the greatest efficiency, about three times as efficient as sunlight. Extending this analysis, the model was applied to design a maximally efficient LED spectrum for algal growth. The result was a 677-nm peak LED spectrum with a total incident photon flux of 360 μE/m2/s, suggesting that for the simple objective of maximizing growth efficiency, LED technology has already reached an effective theoretical optimum.
In summary, the C. reinhardtii metabolic network iRC1080 that we have reconstructed offers insight into the basic biology of this species and may be employed prospectively for genetic engineering design and light source design relevant to algal biotechnology. iRC1080 was used to analyze lipid metabolism and generate novel hypotheses about the evolution of latent pathways. The predictive capacity of metabolic models developed from iRC1080 was demonstrated in simulating mutant phenotypes and in evaluation of light source efficiency. Our network provides a broad knowledgebase of the biochemistry and genomics underlying global metabolism of a photoautotroph, and our modeling approach for light-driven metabolism exemplifies how integration of largely unvisited data types, such as physicochemical environmental parameters, can expand the diversity of applications of metabolic networks.
Metabolic network reconstruction encompasses existing knowledge about an organism's metabolism and genome annotation, providing a platform for omics data analysis and phenotype prediction. The model alga Chlamydomonas reinhardtii is employed to study diverse biological processes from photosynthesis to phototaxis. Recent heightened interest in this species results from an international movement to develop algal biofuels. Integrating biological and optical data, we reconstructed a genome-scale metabolic network for this alga and devised a novel light-modeling approach that enables quantitative growth prediction for a given light source, resolving wavelength and photon flux. We experimentally verified transcripts accounted for in the network and physiologically validated model function through simulation and generation of new experimental growth data, providing high confidence in network contents and predictive applications. The network offers insight into algal metabolism and potential for genetic engineering and efficient light source design, a pioneering resource for studying light-driven metabolism and quantitative systems biology.
PMCID: PMC3202792  PMID: 21811229
Chlamydomonas reinhardtii; lipid metabolism; metabolic engineering; photobioreactor
12.  A Provisional Gene Regulatory Atlas for Mouse Heart Development 
PLoS ONE  2014;9(1):e83364.
Congenital Heart Disease (CHD) is one of the most common birth defects. Elucidating the molecular mechanisms underlying normal cardiac development is an important step towards early identification of abnormalities during the developmental program and towards the creation of early intervention strategies. We developed a novel computational strategy for leveraging high-content data sets, including a large selection of microarray data associated with mouse cardiac development, mouse genome sequence, ChIP-seq data of selected mouse transcription factors and Y2H data of mouse protein-protein interactions, to infer the active transcriptional regulatory network of mouse cardiac development. We identified phase-specific expression activity for 765 overlapping gene co-expression modules that were defined for obtained cardiac lineage microarray data. For each co-expression module, we identified the phase of cardiac development where gene expression for that module was higher than other phases. Co-expression modules were found to be consistent with biological pathway knowledge in Wikipathways, and met expectations for enrichment of pathways involved in heart lineage development. Over 359,000 transcription factor-target relationships were inferred by analyzing the promoter sequences within each gene module for overrepresentation against the JASPAR database of Transcription Factor Binding Site (TFBS) motifs. The provisional regulatory network will provide a framework of studying the genetic basis of CHD.
PMCID: PMC3885437  PMID: 24421884
13.  Analysis of Chaperone mRNA Expression in the Adult Mouse Brain by Meta Analysis of the Allen Brain Atlas 
PLoS ONE  2010;5(10):e13675.
The pathology of many neurodegenerative diseases is characterized by the accumulation of misfolded and aggregated proteins in various cell types and regional substructures throughout the central and peripheral nervous systems. The accumulation of these aggregated proteins signals dysfunction of cellular protein homeostatic mechanisms such as the ubiquitin/proteasome system, autophagy, and the chaperone network. Although there are several published studies in which transcriptional profiling has been used to examine gene expression in various tissues, including tissues of neurodegenerative disease models, there has not been a report that focuses exclusively on expression of the chaperone network. In the present study, we used the Allen Brain Atlas online database to analyze chaperone expression levels. This database utilizes a quantitative in situ hybridization approach and provides data on 270 chaperone genes within many substructures of the adult mouse brain. We determined that 256 of these chaperone genes are expressed at some level. Surprisingly, relatively few genes, only 30, showed significant variations in levels of mRNA across different substructures of the brain. The greatest degree of variability was exhibited by genes of the DnaJ co-chaperone, Tetratricopeptide repeat, and the HSPH families. Our analysis provides a valuable resource towards determining how variations in chaperone gene expression may modulate the vulnerability of specific neuronal populations of mammalian brain.
PMCID: PMC2965669  PMID: 21060842
14.  HMGB1 Mediates Endogenous TLR2 Activation and Brain Tumor Regression 
PLoS Medicine  2009;6(1):e1000010.
Glioblastoma multiforme (GBM) is the most aggressive primary brain tumor that carries a 5-y survival rate of 5%. Attempts at eliciting a clinically relevant anti-GBM immune response in brain tumor patients have met with limited success, which is due to brain immune privilege, tumor immune evasion, and a paucity of dendritic cells (DCs) within the central nervous system. Herein we uncovered a novel pathway for the activation of an effective anti-GBM immune response mediated by high-mobility-group box 1 (HMGB1), an alarmin protein released from dying tumor cells, which acts as an endogenous ligand for Toll-like receptor 2 (TLR2) signaling on bone marrow-derived GBM-infiltrating DCs.
Methods and Findings
Using a combined immunotherapy/conditional cytotoxic approach that utilizes adenoviral vectors (Ad) expressing Fms-like tyrosine kinase 3 ligand (Flt3L) and thymidine kinase (TK) delivered into the tumor mass, we demonstrated that CD4+ and CD8+ T cells were required for tumor regression and immunological memory. Increased numbers of bone marrow-derived, tumor-infiltrating myeloid DCs (mDCs) were observed in response to the therapy. Infiltration of mDCs into the GBM, clonal expansion of antitumor T cells, and induction of an effective anti-GBM immune response were TLR2 dependent. We then proceeded to identify the endogenous ligand responsible for TLR2 signaling on tumor-infiltrating mDCs. We demonstrated that HMGB1 was released from dying tumor cells, in response to Ad-TK (+ gancyclovir [GCV]) treatment. Increased levels of HMGB1 were also detected in the serum of tumor-bearing Ad-Flt3L/Ad-TK (+GCV)-treated mice. Specific activation of TLR2 signaling was induced by supernatants from Ad-TK (+GCV)-treated GBM cells; this activation was blocked by glycyrrhizin (a specific HMGB1 inhibitor) or with antibodies to HMGB1. HMGB1 was also released from melanoma, small cell lung carcinoma, and glioma cells treated with radiation or temozolomide. Administration of either glycyrrhizin or anti-HMGB1 immunoglobulins to tumor-bearing Ad-Flt3L and Ad-TK treated mice, abolished therapeutic efficacy, highlighting the critical role played by HMGB1-mediated TLR2 signaling to elicit tumor regression. Therapeutic efficacy of Ad-Flt3L and Ad-TK (+GCV) treatment was demonstrated in a second glioma model and in an intracranial melanoma model with concomitant increases in the levels of circulating HMGB1.
Our data provide evidence for the molecular and cellular mechanisms that support the rationale for the clinical implementation of antibrain cancer immunotherapies in combination with tumor killing approaches in order to elicit effective antitumor immune responses, and thus, will impact clinical neuro-oncology practice.
Maria Castro and colleagues use cell line and transgenic mouse approaches to study the mechanisms underlying the immune response to glioblastoma multiforme.
Editors' Summary
Every year, more than 175,000 people develop a primary brain tumor (a cancer that starts in the brain rather than spreading in from elsewhere). Like all cancers, brain tumors develop when a cell acquires genetic changes that allow it to grow uncontrollably and that change other aspects of its behavior, including the proteins it makes. There are many different types of cells in the brain and, as a result, there are many different types of brain tumors. However, one in five primary brain tumors is glioblastoma multiforme (GBM; also known as grade 4 astrocytoma), a particularly aggressive cancer. With GBM, the average time from diagnosis to death is one year and only one person in 20 survives for five years after a diagnosis of GBM. Symptoms of GBM include headaches, seizures, and changes in memory, mood, or mental capacity. Treatments for GBM, which include surgery, radiotherapy, and chemotherapy, do not “cure” the tumor but they can ease these symptoms.
Why Was This Study Done?
Better treatments for GBM are badly needed, and one avenue that is being explored is immunotherapy—a treatment in which the immune system is used to fight the cancer. Because many tumors make unusual proteins, the immune system can sometimes be encouraged to recognize tumor cells as foreign invaders and kill them. Unfortunately, attempts to induce a clinically useful anti-GBM immune response have been unsuccessful, partly because the brain contains very few dendritic cells, a type of immune system cell that kick-starts effective immune responses by presenting foreign proteins to other immune system cells. Another barrier to immunotherapy for GBM is immune evasion by the tumor. Many tumors develop ways to avoid the immune response as they grow. For example, they sometimes reduce the expression of proteins that the immune system might recognize as foreign. In this study, the researchers test a new combined treatment strategy for GBM in which dendritic cells are encouraged to enter the brain and tumor cells are killed to release proteins capable of stimulating an effective antitumor immune response.
What Did the Researchers Do and Find?
The researchers first established brain tumors in mice. Then, they injected harmless viruses carrying the genes for Fms-like tyrosine kinase 3 ligand (Ftl3L; a protein that attracts dendritic cells) and for thymidine kinase (TK; cells expressing TK are killed by a drug called gancyclovir) into the tumor. Expression of both Flt3L and TK (but not of either protein alone) plus gancyclovir treatment shrank the tumors and greatly improved the survival of the mice. The researchers show that their strategy increased the migration of dendritic cells into the tumor provided they expressed an immune system protein called Toll-like receptor 2 (TLR2). TLR2 expression on the dendritic cells was also needed for an effective anti-tumor immune response and for tumor regression. TLR2 normally activates dendritic cells by binding to specific proteins on invading pathogens, so what was TLR2 binding to in the mouse tumors? The researchers reveal that TLR2 was responding to high-mobility-group box 1 (HMGB1), a protein released by the dying tumor cells by showing that treatment of the tumor-bearing mice with the HMGB1 inhibitor glycyrrhizin blocked the therapeutic effect of Flt3L/TK expression. Finally, the researchers report that other tumor cell types release HMGB1 when they are killed and that the Flt3L/TK expression strategy can also kill other tumors growing in mouse brains.
What Do These Findings Mean?
Results obtained in mouse models of human diseases do not always lead to effective treatments for human patients. Nevertheless, the findings of this study provide new insights into how an effective immune response against brain tumors might be brought about. Most importantly, they show that an effective strategy might need to both attract dendritic cells into the brain tumor and to kill tumor cells, so they release proteins that can activate the dendritic cells. That is, the authors suggest it's important to combine immunotherapies with tumor-killing strategies to provide effective treatments for primary and metastatic brain tumors
Additional Information.
Please access these Web sites via the online version of this summary at
The US National Cancer Institute provides information about brain tumors for patients and health professionals and about the the immune system and how it can be harnessed to fight cancer (in English and Spanish)
Cancer Research UK provides information on all aspects of brain tumors for patients and their caregivers
MedlinePlus provides links to further information about brain cancer, (including some links to information in Spanish)
The American Brain Tumor Association provides brain tumor resources and information
The National Brain Tumor Society provides educational and support services regarding brain tumors
PMCID: PMC2621261  PMID: 19143470
15.  A HaemAtlas: characterizing gene expression in differentiated human blood cells 
Blood  2009;113(19):e1-e9.
Hematopoiesis is a carefully controlled process that is regulated by complex networks of transcription factors that are, in part, controlled by signals resulting from ligand binding to cell-surface receptors. To further understand hematopoiesis, we have compared gene expression profiles of human erythroblasts, megakaryocytes, B cells, cytotoxic and helper T cells, natural killer cells, granulocytes, and monocytes using whole genome microarrays. A bioinformatics analysis of these data was performed focusing on transcription factors, immunoglobulin superfamily members, and lineage-specific transcripts. We observed that the numbers of lineage-specific genes varies by 2 orders of magnitude, ranging from 5 for cytotoxic T cells to 878 for granulocytes. In addition, we have identified novel coexpression patterns for key transcription factors involved in hematopoiesis (eg, GATA3-GFI1 and GATA2-KLF1). This study represents the most comprehensive analysis of gene expression in hematopoietic cells to date and has identified genes that play key roles in lineage commitment and cell function. The data, which are freely accessible, will be invaluable for future studies on hematopoiesis and the role of specific genes and will also aid the understanding of the recent genome-wide association studies.
PMCID: PMC2680378  PMID: 19228925
16.  A comprehensive map of the mTOR signaling network 
The mammalian target of rapamycin (mTOR) is a central regulator of cell growth and proliferation. mTOR signaling is frequently dysregulated in oncogenic cells, and thus an attractive target for anticancer therapy. Using CellDesigner, a modeling support software for graphical notation, we present herein a comprehensive map of the mTOR signaling network, which includes 964 species connected by 777 reactions. The map complies with both the systems biology markup language (SBML) and graphical notation (SBGN) for computational analysis and graphical representation, respectively. As captured in the mTOR map, we review and discuss our current understanding of the mTOR signaling network and highlight the impact of mTOR feedback and crosstalk regulations on drug-based cancer therapy. This map is available on the Payao platform, a Web 2.0 based community-wide interactive process for creating more accurate and information-rich databases. Thus, this comprehensive map of the mTOR network will serve as a tool to facilitate systems-level study of up-to-date mTOR network components and signaling events toward the discovery of novel regulatory processes and therapeutic strategies for cancer.
PMCID: PMC3018167  PMID: 21179025
cancer; CellDesigner; graphical notation; mTOR; regulatory network
17.  TobEA: an atlas of tobacco gene expression from seed to senescence 
BMC Genomics  2010;11:142.
Transcriptomics has resulted in the development of large data sets and tools for the progression of functional genomics and systems biology in many model organisms. Currently there is no commercially available microarray to allow such expression studies in Nicotiana tabacum (tobacco).
A custom designed Affymetrix tobacco expression microarray was generated from a set of over 40k unigenes and used to measure gene expression in 19 different tobacco samples to produce the Tobacco Expression Atlas (TobEA). TobEA provides a snap shot of the transcriptional activity for thousands of tobacco genes in different tissues throughout the lifecycle of the plant and enables the identification of the biological processes occurring in these different tissues. 772 of 2513 transcription factors previously identified in tobacco were mapped to the array, with 87% of them being expressed in at least one tissue in the atlas. Putative transcriptional networks were identified based on the co-expression of these transcription factors. Several interactions in a floral identity transcription factor network were consistent with previous results from other plant species. To broaden access and maximise the benefit of TobEA a set of tools were developed to provide researchers with expression information on their genes of interest via the Solanaceae Genomics Network (SGN) web site. The array has also been made available for public use via the Nottingham Arabidopsis Stock Centre microarray service.
The generation of a tobacco expression microarray is an important development for research in this model plant. The data provided by TobEA represents a valuable resource for plant functional genomics and systems biology research and can be used to identify gene targets for both fundamental and applied scientific applications in tobacco.
PMCID: PMC2841117  PMID: 20187945
18.  A Systems Biology Approach Reveals that Tissue Tropism to West Nile Virus Is Regulated by Antiviral Genes and Innate Immune Cellular Processes 
PLoS Pathogens  2013;9(2):e1003168.
The actions of the RIG-I like receptor (RLR) and type I interferon (IFN) signaling pathways are essential for a protective innate immune response against the emerging flavivirus West Nile virus (WNV). In mice lacking RLR or IFN signaling pathways, WNV exhibits enhanced tissue tropism, indicating that specific host factors of innate immune defense restrict WNV infection and dissemination in peripheral tissues. However, the immune mechanisms by which the RLR and IFN pathways coordinate and function to impart restriction of WNV infection are not well defined. Using a systems biology approach, we defined the host innate immune response signature and actions that restrict WNV tissue tropism. Transcriptional profiling and pathway modeling to compare WNV-infected permissive (spleen) and nonpermissive (liver) tissues showed high enrichment for inflammatory responses, including pattern recognition receptors and IFN signaling pathways, that define restriction of WNV replication in the liver. Assessment of infected livers from Mavs−/−×Ifnar−/− mice revealed the loss of expression of several key components within the natural killer (NK) cell signaling pathway, including genes associated with NK cell activation, inflammatory cytokine production, and NK cell receptor signaling. In vivo analysis of hepatic immune cell infiltrates from WT mice demonstrated that WNV infection leads to an increase in NK cell numbers with enhanced proliferation, maturation, and effector action. In contrast, livers from Mavs−/−×Ifnar−/− infected mice displayed reduced immune cell infiltration, including a significant reduction in NK cell numbers. Analysis of cocultures of dendritic and NK cells revealed both cell-intrinsic and -extrinsic roles for the RLR and IFN signaling pathways to regulate NK cell effector activity. Taken together, these observations reveal a complex innate immune signaling network, regulated by the RLR and IFN signaling pathways, that drives tissue-specific antiviral effector gene expression and innate immune cellular processes that control tissue tropism to WNV infection.
Author Summary
West Nile virus (WNV), a mosquito-transmitted RNA flavivirus, is an NIAID Category B infectious agent that has emerged in the Western hemisphere as a serious public health threat. The innate immune effectors that impart restriction of WNV infection are not well defined. WNV infection is sensed by the host RIG-I like receptors (RLR), a class of pattern recognition receptors, to trigger type I interferon (IFN) and related innate immune defense programs. Using a systems biology approach, we evaluated the contribution of the RLR and type I IFN signaling pathways in controlling tissue tropism. WNV infection triggers tissue-specific innate immune responses, specifically antiviral effector genes and natural killer (NK) cell signaling related genes, which are directly regulated by the combined actions of the RLR and type I IFN signaling pathways. Cocultures of dendritic and NK cells revealed that RLR and type I IFN signaling pathways are essential in promoting NK cell activation during WNV infection. Our observations indicate that combined RLR- and type I IFN-dependent signaling programs drive specific antiviral effector gene expression and programs NK cell responses that, together, serve to restrict WNV tissue tropism.
PMCID: PMC3567171  PMID: 23544010
19.  CySBGN: A Cytoscape plug-in to integrate SBGN maps 
BMC Bioinformatics  2013;14:17.
A standard graphical notation is essential to facilitate exchange of network representations of biological processes. Towards this end, the Systems Biology Graphical Notation (SBGN) has been proposed, and it is already supported by a number of tools. However, support for SBGN in Cytoscape, one of the most widely used platforms in biology to visualise and analyse networks, is limited, and in particular it is not possible to import SBGN diagrams.
We have developed CySBGN, a Cytoscape plug-in that extends the use of Cytoscape visualisation and analysis features to SBGN maps. CySBGN adds support for Cytoscape users to visualize any of the three complementary SBGN languages: Process Description, Entity Relationship, and Activity Flow. The interoperability with other tools (CySBML plug-in and Systems Biology Format Converter) was also established allowing an automated generation of SBGN diagrams based on previously imported SBML models. The plug-in was tested using a suite of 53 different test cases that covers almost all possible entities, shapes, and connections. A rendering comparison with other tools that support SBGN was performed. To illustrate the interoperability with other Cytoscape functionalities, we present two analysis examples, shortest path calculation, and motif identification in a metabolic network.
CySBGN imports, modifies and analyzes SBGN diagrams in Cytoscape, and thus allows the application of the large palette of tools and plug-ins in this platform to networks and pathways in SBGN format.
PMCID: PMC3599859  PMID: 23324051
20.  Signaling network of dendritic cells in response to pathogens: a community-input supported knowledgebase 
BMC Systems Biology  2010;4:137.
Dendritic cells are antigen-presenting cells that play an essential role in linking the innate and adaptive immune systems. Much research has focused on the signaling pathways triggered upon infection of dendritic cells by various pathogens. The high level of activity in the field makes it desirable to have a pathway-based resource to access the information in the literature. Current pathway diagrams lack either comprehensiveness, or an open-access editorial interface. Hence, there is a need for a dependable, expertly curated knowledgebase that integrates this information into a map of signaling networks.
We have built a detailed diagram of the dendritic cell signaling network, with the goal of providing researchers with a valuable resource and a facile method for community input. Network construction has relied on comprehensive review of the literature and regular updates. The diagram includes detailed depictions of pathways activated downstream of different pathogen recognition receptors such as Toll-like receptors, retinoic acid-inducible gene-I-like receptors, C-type lectin receptors and nucleotide-binding oligomerization domain-like receptors. Initially assembled using CellDesigner software, it provides an annotated graphical representation of interactions stored in Systems Biology Mark-up Language. The network, which comprises 249 nodes and 213 edges, has been web-published through the Biological Pathway Publisher software suite. Nodes are annotated with PubMed references and gene-related information, and linked to a public wiki, providing a discussion forum for updates and corrections. To gain more insight into regulatory patterns of dendritic cell signaling, we analyzed the network using graph-theory methods: bifan, feedforward and multi-input convergence motifs were enriched. This emphasis on activating control mechanisms is consonant with a network that subserves persistent and coordinated responses to pathogen detection.
This map represents a navigable aid for presenting a consensus view of the current knowledge on dendritic cell signaling that can be continuously improved through contributions of research community experts. Because the map is available in a machine readable format, it can be edited and may assist researchers in data analysis. Furthermore, the availability of a comprehensive knowledgebase might help further research in this area such as vaccine development. The dendritic cell signaling knowledgebase is accessible at
PMCID: PMC2958907  PMID: 20929569
21.  Nuclear Receptor Signaling Atlas (): hyperlinking the nuclear receptor signaling community 
Nucleic Acids Research  2005;34(Database issue):D221-D226.
The nuclear receptor signaling (NRS) field has generated a substantial body of information on nuclear receptors, their ligands and coregulators, with the ultimate goal of constructing coherent models of the biological and clinical significance of these molecules. As a component of the Nuclear Receptor Signaling Atlas (NURSA)—the development of a functional atlas of nuclear receptor biology—the NURSA Bioinformatics Resource is developing a strategy to organize and integrate legacy and future information on these molecules in a single web-based resource (). This entails parallel efforts of (i) developing an appropriate software framework for handling datasets from NURSA laboratories and (ii) designing strategies for the curation and presentation of public data relevant to NRS. To illustrate our approach, we have described here in detail the development of a web-based interface for the NURSA quantitative PCR nuclear receptor expression dataset, incorporating bioinformatics analysis which provides novel perspectives on functional relationships between these molecules. We anticipate that the free and open access of the community to a platform for data mining and hypothesis generation strategies will be a significant contribution to the progress of research in this field.
PMCID: PMC1347392  PMID: 16381851
22.  An Atlas for Schistosoma mansoni Organs and Life-Cycle Stages Using Cell Type-Specific Markers and Confocal Microscopy 
Schistosomiasis (bilharzia) is a tropical disease caused by trematode parasites (Schistosoma) that affects hundreds of millions of people in the developing world. Currently only a single drug (praziquantel) is available to treat this disease, highlighting the importance of developing new techniques to study Schistosoma. While molecular advances, including RNA interference and the availability of complete genome sequences for two Schistosoma species, will help to revolutionize studies of these animals, an array of tools for visualizing the consequences of experimental perturbations on tissue integrity and development needs to be made widely available. To this end, we screened a battery of commercially available stains, antibodies and fluorescently labeled lectins, many of which have not been described previously for analyzing schistosomes, for their ability to label various cell and tissue types in the cercarial stage of S. mansoni. This analysis uncovered more than 20 new markers that label most cercarial tissues, including the tegument, the musculature, the protonephridia, the secretory system and the nervous system. Using these markers we present a high-resolution visual depiction of cercarial anatomy. Examining the effectiveness of a subset of these markers in S. mansoni adults and miracidia, we demonstrate the value of these tools for labeling tissues in a variety of life-cycle stages. The methodologies described here will facilitate functional analyses aimed at understanding fundamental biological processes in these parasites.
Author Summary
Schistosomes are parasitic flatworms that infect hundreds of millions of people worldwide. The development of genomic resources and recent application of functional genomic tools (e.g., global gene expression studies, inhibition of gene expression by RNA interference, and transgenesis) hold the promise of revolutionizing the study of schistosome biology. These advances necessitate the introduction of molecular markers for examining the consequences of manipulating schistosome genes. In this manuscript we report the use of several cell type-specific markers and confocal microscopy for visualizing various schistosome tissues in a variety of life-cycle stages. Our analysis provides an atlas of the major organ systems in three different life-cycle stages in these important parasites. The tools and methodologies reported here are widely available and can be readily adopted by researchers interested in more detailed studies of these organisms. We anticipate that these resources will be particularly useful for detailed phenotypic characterization following gene inhibition or over-expression studies.
PMCID: PMC3050934  PMID: 21408085
23.  A pan-cancer proteomic perspective on The Cancer Genome Atlas 
Nature communications  2014;5:3887.
Protein levels and function are poorly predicted by genomic and transcriptomic analysis of patient tumors. Therefore, direct study of the functional proteome has the potential to provide a wealth of information that complements and extends genomic, epigenomic and transcriptomic analysis in The Cancer Genome Atlas (TCGA) projects. Here we use reverse-phase protein arrays to analyze 3,467 patient samples from 11 TCGA “Pan-Cancer” diseases, using 181 high-quality antibodies that target 128 total proteins and 53 post-translationally modified proteins. The resultant proteomic data is integrated with genomic and transcriptomic analyses of the same samples to identify commonalities, differences, emergent pathways and network biology within and across tumor lineages. In addition, tissue-specific signals are reduced computationally to enhance biomarker and target discovery spanning multiple tumor lineages. This integrative analysis, with an emphasis on pathways and potentially actionable proteins, provides a framework for determining the prognostic, predictive and therapeutic relevance of the functional proteome.
PMCID: PMC4109726  PMID: 24871328
Proteomics; TCGA; Pan-Cancer; protein expression; protein networks
24.  Systematic integration of experimental data and models in systems biology 
BMC Bioinformatics  2010;11:582.
The behaviour of biological systems can be deduced from their mathematical models. However, multiple sources of data in diverse forms are required in the construction of a model in order to define its components and their biochemical reactions, and corresponding parameters. Automating the assembly and use of systems biology models is dependent upon data integration processes involving the interoperation of data and analytical resources.
Taverna workflows have been developed for the automated assembly of quantitative parameterised metabolic networks in the Systems Biology Markup Language (SBML). A SBML model is built in a systematic fashion by the workflows which starts with the construction of a qualitative network using data from a MIRIAM-compliant genome-scale model of yeast metabolism. This is followed by parameterisation of the SBML model with experimental data from two repositories, the SABIO-RK enzyme kinetics database and a database of quantitative experimental results. The models are then calibrated and simulated in workflows that call out to COPASIWS, the web service interface to the COPASI software application for analysing biochemical networks. These systems biology workflows were evaluated for their ability to construct a parameterised model of yeast glycolysis.
Distributed information about metabolic reactions that have been described to MIRIAM standards enables the automated assembly of quantitative systems biology models of metabolic networks based on user-defined criteria. Such data integration processes can be implemented as Taverna workflows to provide a rapid overview of the components and their relationships within a biochemical system.
PMCID: PMC3008707  PMID: 21114840
25.  A self-updating road map of The Cancer Genome Atlas 
Bioinformatics  2013;29(10):1333-1340.
Motivation: Since 2011, The Cancer Genome Atlas’ (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months.
Results: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals.
Availability: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at A video tutorial is available at
PMCID: PMC3654710  PMID: 23595662

Results 1-25 (1268862)