Search tips
Search criteria

Results 1-25 (50)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
Document Types
1.  Emerging topic: flow-related epigenetic regulation of endothelial phenotype through DNA methylation 
Vascular pharmacology  2014;62(2):88-93.
Atherosclerosis is a multi-focal disease; it is associated with arterial curvatures, asymmetries and branches/bifurcations where non-uniform arterial geometry generates patterns of blood flow that are considerably more complex than elsewhere, and are collectively referred to as disturbed flow. Such regions are predisposed to atherosclerosis and are the sites of ‘athero-susceptible’ endothelial cells that express regionally different cell phenotypes than endothelium in nearby athero-protected locations. The regulatory hierarchy of endothelial function includes control at the epigenetic level. MicroRNAs and histone modifications are established epigenetic regulators that respond to disturbed flow. However, very recent reports have linked transcriptional regulation by DNA methylation to endothelial gene expression in disturbed flow in vivo and in vitro. We outline these in the context of site-specific atherosusceptibility mediated by local hemodynamics.
PMCID: PMC4116435  PMID: 24874278
Endothelial gene expression; Hemodynamic disturbed flow; Differential methylation region; Methylome; Atherosclerosis; KLF4; HOX genes
2.  Arterial endothelial methylome: differential DNA methylation in athero-susceptible disturbed flow regions in vivo 
BMC Genomics  2015;16(1):506.
Atherosclerosis is a heterogeneously distributed disease of arteries in which the endothelium plays an important central role. Spatial transcriptome profiling of endothelium in pre-lesional arteries has demonstrated differential phenotypes primed for athero-susceptibility at hemodynamic sites associated with disturbed blood flow. DNA methylation is a powerful epigenetic regulator of endothelial transcription recently associated with flow characteristics. We investigated differential DNA methylation in flow region-specific aortic endothelial cells in vivo in adult domestic male and female swine.
Genome-wide DNA methylation was profiled in endothelial cells (EC) isolated from two robust locations of differing patho-susceptibility: − an athero-susceptible site located at the inner curvature of the aortic arch (AA) and an athero-protected region in the descending thoracic (DT) aorta. Complete methylated DNA immunoprecipitation sequencing (MeDIP-seq) identified over 5500 endothelial differentially methylated regions (DMRs). DMR density was significantly enriched in exons and 5’UTR sequences of annotated genes, 60 of which are linked to cardiovascular disease. The set of DMR-associated genes was enriched in transcriptional regulation, pattern specification HOX loci, oxidative stress and the ER stress adaptive pathway, all categories linked to athero-susceptible endothelium. Examination of the relationship between DMR and mRNA in HOXA genes demonstrated a significant inverse relationship between CpG island promoter methylation and gene expression. Methylation-specific PCR (MSP) confirmed differential CpG methylation of HOXA genes, the ER stress gene ATF4, inflammatory regulator microRNA-10a and ARHGAP25 that encodes a negative regulator of Rho GTPases involved in cytoskeleton remodeling. Gender-specific DMRs associated with ciliogenesis that may be linked to defects in cilia development were also identified in AA DMRs.
An endothelial methylome analysis identifies epigenetic DMR characteristics associated with transcriptional regulation in regions of atherosusceptibility in swine aorta in vivo. The data represent the first methylome blueprint for spatio-temporal analyses of lesion susceptibility predisposing to endothelial dysfunction in complex flow environments in vivo.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1656-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4492093  PMID: 26148682
Endothelium; DNA Methylation; Epigenetics; Hemodynamics; Disturbed Flow; HOX Genes; Atherosclerosis; Endothelial Gene Transcription
3.  Ontodog: a web-based ontology community view generation tool 
Bioinformatics  2014;30(9):1340-1342.
Summary: Biomedical ontologies are often very large and complex. Only a subset of the ontology may be needed for a specified application or community. For ontology end users, it is desirable to have community-based labels rather than the labels generated by ontology developers. Ontodog is a web-based system that can generate an ontology subset based on Excel input, and support generation of an ontology community view, which is defined as the whole or a subset of the source ontology with user-specified annotations including user-preferred labels. Ontodog allows users to easily generate community views with minimal ontology knowledge and no programming skills or installation required. Currently >100 ontologies including all OBO Foundry ontologies are available to generate the views based on user needs. We demonstrate the application of Ontodog for the generation of community views using the Ontology for Biomedical Investigations as the source ontology.
PMCID: PMC3998133  PMID: 24413522
4.  Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE) 
Nature biotechnology  2008;26(3):305-312.
One purpose of the biomedical literature is to report results in sufficient detail so that the methods of data collection and analysis can be independently replicated and verified. Here we present for consideration a minimum information specification for gene expression localization experiments, called the “Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE)”. It is modelled after the MIAME (Minimum Information About a Microarray Experiment) specification for microarray experiments. Data specifications like MIAME and MISFISHIE specify the information content without dictating a format for encoding that information. The MISFISHIE specification describes six types of information that should be provided for each experiment: Experimental Design, Biomaterials and Treatments, Reporters, Staining, Imaging Data, and Image Characterizations. This specification has benefited the consortium within which it was initially developed and is expected to benefit the wider research community. We welcome feedback from the scientific community to help improve our proposal.
PMCID: PMC4367930  PMID: 18327244
5.  EuPathDB: The Eukaryotic Pathogen database 
Nucleic Acids Research  2012;41(Database issue):D684-D691.
EuPathDB ( resources include 11 databases supporting eukaryotic pathogen genomic and functional genomic data, isolate data and phylogenomics. EuPathDB resources are built using the same infrastructure and provide a sophisticated search strategy system enabling complex interrogations of underlying data. Recent advances in EuPathDB resources include the design and implementation of a new data loading workflow, a new database supporting Piroplasmida (i.e. Babesia and Theileria), the addition of large amounts of new data and data types and the incorporation of new analysis tools. New data include genome sequences and annotation, strand-specific RNA-seq data, splice junction predictions (based on RNA-seq), phosphoproteomic data, high-throughput phenotyping data, single nucleotide polymorphism data based on high-throughput sequencing (HTS) and expression quantitative trait loci data. New analysis tools enable users to search for DNA motifs and define genes based on their genomic colocation, view results from searches graphically (i.e. genes mapped to chromosomes or isolates displayed on a map) and analyze data from columns in result tables (word cloud and histogram summaries of column content). The manuscript herein describes updates to EuPathDB since the previous report published in NAR in 2010.
PMCID: PMC3531183  PMID: 23175615
6.  AmoebaDB and MicrosporidiaDB: functional genomic resources for Amoebozoa and Microsporidia species 
Nucleic Acids Research  2010;39(Database issue):D612-D619.
AmoebaDB ( and MicrosporidiaDB ( are new functional genomic databases serving the amoebozoa and microsporidia research communities, respectively. AmoebaDB contains the genomes of three Entamoeba species (E. dispar, E. invadens and E. histolityca) and microarray expression data for E. histolytica. MicrosporidiaDB contains the genomes of Encephalitozoon cuniculi, E. intestinalis and E. bieneusi. The databases belong to the National Institute of Allergy and Infectious Diseases (NIAID) funded EuPathDB ( Bioinformatics Resource Center family of integrated databases and assume the same architectural and graphical design as other EuPathDB resources such as PlasmoDB and TriTrypDB. Importantly they utilize the graphical strategy builder that affords a database user the ability to ask complex multi-data-type questions with relative ease and versatility. Genomic scale data can be queried based on BLAST searches, annotation keywords and gene ID searches, GO terms, sequence motifs, protein characteristics, phylogenetic relationships and functional data such as transcript (microarray and EST evidence) and protein expression data. Search strategies can be saved within a user’s profile for future retrieval and may also be shared with other researchers using a unique strategy web address.
PMCID: PMC3013638  PMID: 20974635
7.  Data Standards for Omics Data: The Basis of Data Sharing and Reuse 
To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.
PMCID: PMC4152841  PMID: 21370078
Data sharing; Data exchange; Data standards; MGED; MIAME; Ontology; Data format; Microarray; Proteomics; Metabolomics
8.  CLO: The cell line ontology 
Cell lines have been widely used in biomedical research. The community-based Cell Line Ontology (CLO) is a member of the OBO Foundry library that covers the domain of cell lines. Since its publication two years ago, significant updates have been made, including new groups joining the CLO consortium, new cell line cells, upper level alignment with the Cell Ontology (CL) and the Ontology for Biomedical Investigation, and logical extensions.
Construction and content
Collaboration among the CLO, CL, and OBI has established consensus definitions of cell line-specific terms such as ‘cell line’, ‘cell line cell’, ‘cell line culturing’, and ‘mortal’ vs. ‘immortal cell line cell’. A cell line is a genetically stable cultured cell population that contains individual cell line cells. The hierarchical structure of the CLO is built based on the hierarchy of the in vivo cell types defined in CL and tissue types (from which cell line cells are derived) defined in the UBERON cross-species anatomy ontology. The new hierarchical structure makes it easier to browse, query, and perform automated classification. We have recently added classes representing more than 2,000 cell line cells from the RIKEN BRC Cell Bank to CLO. Overall, the CLO now contains ~38,000 classes of specific cell line cells derived from over 200 in vivo cell types from various organisms.
Utility and discussion
The CLO has been applied to different biomedical research studies. Example case studies include annotation and analysis of EBI ArrayExpress data, bioassays, and host-vaccine/pathogen interaction. CLO’s utility goes beyond a catalogue of cell line types. The alignment of the CLO with related ontologies combined with the use of ontological reasoners will support sophisticated inferencing to advance translational informatics development.
PMCID: PMC4387853  PMID: 25852852
Cell line; Cell line cell; Immortal cell line cell; Mortal cell line cell; Cell line cell culturing; Anatomy
9.  Insm1 promotes endocrine cell differentiation by modulating the expression of a network of genes that includes Neurog3 and Ripply3 
Development (Cambridge, England)  2014;141(15):2939-2949.
Insulinoma associated 1 (Insm1) plays an important role in regulating the development of cells in the central and peripheral nervous systems, olfactory epithelium and endocrine pancreas. To better define the role of Insm1 in pancreatic endocrine cell development we generated mice with an Insm1GFPCre reporter allele and used them to study Insm1-expressing and null populations. Endocrine progenitor cells lacking Insm1 were less differentiated and exhibited broad defects in hormone production, cell proliferation and cell migration. Embryos lacking Insm1 contained greater amounts of a non-coding Neurog3 mRNA splice variant and had fewer Neurog3/Insm1 co-expressing progenitor cells, suggesting that Insm1 positively regulates Neurog3. Moreover, endocrine progenitor cells that express either high or low levels of Pdx1, and thus may be biased towards the formation of specific cell lineages, exhibited cell type-specific differences in the genes regulated by Insm1. Analysis of the function of Ripply3, an Insm1-regulated gene enriched in the Pdx1-high cell population, revealed that it negatively regulates the proliferation of early endocrine cells. Taken together, these findings indicate that in developing pancreatic endocrine cells Insm1 promotes the transition from a ductal progenitor to a committed endocrine cell by repressing a progenitor cell program and activating genes essential for RNA splicing, cell migration, controlled cellular proliferation, vasculogenesis, extracellular matrix and hormone secretion.
PMCID: PMC4197673  PMID: 25053427
Pancreas development; Endocrine progenitor cells; Gene expression; Transcription factors; Mouse
10.  EuPathDB: a portal to eukaryotic pathogen databases 
Nucleic Acids Research  2009;38(Database issue):D415-D419.
EuPathDB (; formerly ApiDB) is an integrated database covering the eukaryotic pathogens of the genera Cryptosporidium, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma. While each of these groups is supported by a taxon-specific database built upon the same infrastructure, the EuPathDB portal offers an entry point to all these resources, and the opportunity to leverage orthology for searches across genera. The most recent release of EuPathDB includes updates and changes affecting data content, infrastructure and the user interface, improving data access and enhancing the user experience. EuPathDB currently supports more than 80 searches and the recently-implemented ‘search strategy’ system enables users to construct complex multi-step searches via a graphical interface. Search results are dynamically displayed as the strategy is constructed or modified, and can be downloaded, saved, revised, or shared with other database users.
PMCID: PMC2808945  PMID: 19914931
11.  Standardized Metadata for Human Pathogen/Vector Genomic Sequences 
PLoS ONE  2014;9(6):e99979.
High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
PMCID: PMC4061050  PMID: 24936976
12.  PlasmoDB: a functional genomic database for malaria parasites 
Nucleic Acids Research  2008;37(Database issue):D539-D543.
PlasmoDB ( is a functional genomic database for Plasmodium spp. that provides a resource for data analysis and visualization in a gene-by-gene or genome-wide scale. PlasmoDB belongs to a family of genomic resources that are housed under the EuPathDB ( Bioinformatics Resource Center (BRC) umbrella. The latest release, PlasmoDB 5.5, contains numerous new data types from several broad categories—annotated genomes, evidence of transcription, proteomics evidence, protein function evidence, population biology and evolution. Data in PlasmoDB can be queried by selecting the data of interest from a query grid or drop down menus. Various results can then be combined with each other on the query history page. Search results can be downloaded with associated functional data and registered users can store their query history for future retrieval or analysis.
PMCID: PMC2686598  PMID: 18957442
13.  GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis 
Nucleic Acids Research  2008;37(Database issue):D526-D530.
GiardiaDB ( and TrichDB ( house the genome databases for Giardia lamblia and Trichomonas vaginalis, respectively, and represent the latest additions to the EuPathDB ( family of functional genomic databases. GiardiaDB and TrichDB employ the same framework as other EuPathDB sites (CryptoDB, PlasmoDB and ToxoDB), supporting fully integrated and searchable databases. Genomic-scale data available via these resources may be queried based on BLAST searches, annotation keywords and gene ID searches, GO terms, sequence motifs and other protein characteristics. Functional queries may also be formulated, based on transcript and protein expression data from a variety of platforms. Phylogenetic relationships may also be interrogated. The ability to combine the results from independent queries, and to store queries and query results for future use facilitates complex, genome-wide mining of functional genomic data.
PMCID: PMC2686445  PMID: 18824479
14.  Discovery Approaches to UPR in Athero-Susceptible Endothelium In Vivo 
Methods in enzymology  2011;489:10.1016/B978-0-12-385116-1.00007-8.
The endothelium is a monolayer of cells that lines the entire inner surface of the cardiovascular and lymphatic circulations where it controls normal physiological functions through both systemic and local regulation. Endothelial phenotypes are heterogeneous, dynamic and malleable, properties that in large- and medium-sized arteries lead to a central role in the development of focal and regional atherosclerosis. The endothelial phenotype in athero-susceptible sites is different from that in nearby athero-resistant regions. Understanding the in vivo gene, protein, and metabolic expression profiles of susceptible endothelium is, therefore, an important spatiotemporal challenge in atherosclerosis research. Recent studies have demonstrated that endoplasmic reticulum (ER) stress and the UPR are characteristics of susceptible endothelium. Here, we outline global genomic profiling, pathway analyses, and gene connectivity approaches to the identification of UPR and associated pathways as discrete markers of athero-susceptibility in arterial endothelium.
PMCID: PMC3833809  PMID: 21266227
15.  Dual lineage-specific expression of Sox17 during mouse embryogenesis 
Stem cells (Dayton, Ohio)  2012;30(10):2297-2308.
Sox17 is essential for both endoderm development and fetal hematopoietic stem cell (HSC) maintenance. While endoderm-derived organs are well known to originate from Sox17-expressing cells it is less certain whether fetal HSCs also originate from Sox17-expressing cells. By generating a Sox17GFPCre allele and using it to assess the fate of Sox17-expressing cells during embryogenesis we confirmed that both endodermal and a part of definitive hematopoietic cells are derived from Sox17-positive cells. Prior to E9.5 the expression of Sox17 is restricted to the endoderm lineage. However, at E9.5 Sox17 is expressed in the endothelial cells (ECs) at the para-aortic splanchnopleural (P-Sp) region that contribute to the formation of HSCs at a later stage. The identification of two distinct progenitor cell populations that express Sox17 at E9.5 was confirmed using FACS together with RNA-Seq to determine the gene expression profiles of the two cell populations. Interestingly, this analysis revealed differences in the RNA processing of the Sox17 mRNA during embryogenesis. Taken together, these results indicate that Sox17 is expressed in progenitor cells derived from two different germ layers, further demonstrating the complex expression pattern of this gene and suggesting caution when using Sox17 as a lineage-specific marker.
PMCID: PMC3448801  PMID: 22865702
16.  Stat and interferon genes identified by network analysis differentially regulate primitive and definitive erythropoiesis 
BMC Systems Biology  2013;7:38.
Hematopoietic ontogeny is characterized by overlapping waves of primitive, fetal definitive, and adult definitive erythroid lineages. Our aim is to identify differences in the transcriptional control of these distinct erythroid cell maturation pathways by inferring and analyzing gene-interaction networks from lineage-specific expression datasets. Inferred networks are strongly connected and do not fit a scale-free model, making it difficult to identify essential regulators using the hub-essentiality standard.
We employed a semi-supervised machine learning approach to integrate measures of network topology with expression data to score gene essentiality. The algorithm was trained and tested on the adult and fetal definitive erythroid lineages. When applied to the primitive erythroid lineage, 144 high scoring transcription factors were found to be differentially expressed between the primitive and adult definitive erythroid lineages, including all expressed STAT-family members. Differential responses of primitive and definitive erythroblasts to a Stat3 inhibitor and IFNγ in vitro supported the results of the computational analysis. Further investigation of the original expression data revealed a striking signature of Stat1-related genes in the adult definitive erythroid network. Among the potential pathways known to utilize Stat1, interferon (IFN) signaling-related genes were expressed almost exclusively within the adult definitive erythroid network.
In vitro results support the computational prediction that differential regulation and downstream effectors of STAT signaling are key factors that distinguish the transcriptional control of primitive and definitive erythroid cell maturation.
PMCID: PMC3668222  PMID: 23675896
Primitive erythropoiesis; Definitive erythropoiesis; Stat1; Stat3; IFN-signaling; Gene-regulatory networks; Co-expression network inference
17.  Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM) 
Bioinformatics  2011;27(18):2518-2528.
Motivation: A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously.
Results: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription–polymerase chain reaction (RT–PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability.
Availability: The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (
Supplementary Information:The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.
PMCID: PMC3167048  PMID: 21775302
18.  Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups 
OrthoMCL is an algorithm for grouping proteins into ortholog groups based on their sequence similarity. OrthoMCL-DB is a public database that allows users to browse and view ortholog groups that were pre-computed using the OrthoMCL algorithm. Version 4 of this database contained 116,536 ortholog groups clustered from 1,270,853 proteins obtained from 88 eukaryotic genomes, 16 archaeal genomes and 34 bacterial genomes. Future versions of OrthoMCL-DB will include more proteomes as more genomes are sequenced. Here, we describe how you can group your proteins of interest into ortholog clusters using two different means provided by the OrthoMCL system. The OrthoMCL-DB website has a tool for uploading and grouping a set of protein sequences, typically representing a proteome. This method maps the uploaded proteins to existing groups in OrthoMCL-DB. Alternatively, if you have proteins from a set of genomes that need to be grouped, you can download, install and run the standalone OrthoMCL software.
PMCID: PMC3196566  PMID: 21901743
OrthoMCL; ortholog groups; paralog; proteome; Markov clustering; reciprocal best hits; MCL
19.  Coronary Artery Endothelial Transcriptome In Vivo: Identification of Endoplasmic Reticulum Stress and Enhanced ROS by Gene Connectivity Network Analysis 
Endothelial function is central to the localization of atherosclerosis. The in vivo endothelial phenotypic footprints of arterial bed identity and site-specific athero-susceptibility are addressed.
Methods and Results
98 endothelial cell samples from 13 discrete coronary and non-coronary arterial regions of varying susceptibilities to atherosclerosis were isolated from 76 normal swine. Transcript profiles were analyzed to determine the steady state in vivo endothelial phenotypes. An unsupervised systems biology approach utilizing weighted gene co-expression networks determined highly correlated endothelial genes. Connectivity network analysis identified 19 gene modules, 12 of which showed significant association with circulatory bed classification. Differential expression of 1,300 genes between coronary and non-coronary artery endothelium suggested distinct coronary endothelial phenotypes with highest significance expressed in gene modules enriched for biological functions related to endoplasmic reticulum (ER) stress and unfolded protein binding, regulation of transcription and translation, and redox homeostasis. Furthermore, within coronary arteries comparison of endothelial transcript profiles of susceptible proximal regions to protected distal regions suggested the presence of ER stress conditions in susceptible sites. Accumulation of reactive oxygen species (ROS) throughout coronary endothelium was greater than in non-coronary endothelium consistent with coronary artery ER stress and the lower endothelial expression of anti-oxidant genes in coronary arteries.
Gene connectivity analyses discriminated between coronary and non-coronary endothelial transcript profiles and identified differential transcript levels associated with increased ER and oxidative stress in coronary arteries, consistent with enhanced susceptibility to atherosclerosis.
PMCID: PMC3116084  PMID: 21493819
weighted gene co-expression networks; microarray; endoplasmic reticulum stress; unfolded protein response; reactive oxygen species
20.  The Ontology for Parasite Lifecycle (OPL): towards a consistent vocabulary of lifecycle stages in parasitic organisms 
Genome sequencing of many eukaryotic pathogens and the volume of data available on public resources have created a clear requirement for a consistent vocabulary to describe the range of developmental forms of parasites. Consistent labeling of experimental data and external data, in databases and the literature, is essential for integration, cross database comparison, and knowledge discovery. The primary objective of this work was to develop a dynamic and controlled vocabulary that can be used for various parasites. The paper describes the Ontology for Parasite Lifecycle (OPL) and discusses its application in parasite research.
The OPL is based on the Basic Formal Ontology (BFO) and follows the rules set by the OBO Foundry consortium. The first version of the OPL models complex life cycle stage details of a range of parasites, such as Trypanosoma sp., Leishmaniasp., Plasmodium sp., and Shicstosoma sp. In addition, the ontology also models necessary contextual details, such as host information, vector information, and anatomical locations. OPL is primarily designed to serve as a reference ontology for parasite life cycle stages that can be used for database annotation purposes and in the lab for data integration or information retrieval as exemplified in the application section below.
OPL is freely available at and has been submitted to the BioPortal site of NCBO and to the OBO Foundry. We believe that database and phenotype annotations using OPL will help run fundamental queries on databases to know more about gene functions and to find intervention targets for various parasites. The OPL is under continuous development and new parasites and/or terms are being added.
PMCID: PMC3488002  PMID: 22621763
21.  AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments 
The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at
Database URL:
PMCID: PMC3244265  PMID: 22190598
23.  The strategies WDK: a graphical search interface and web development kit for functional genomics databases 
Web sites associated with the Eukaryotic Pathogen Bioinformatics Resource Center ( have recently introduced a graphical user interface, the Strategies WDK, intended to make advanced searching and set and interval operations easy and accessible to all users. With a design guided by usability studies, the system helps motivate researchers to perform dynamic computational experiments and explore relationships across data sets. For example, PlasmoDB users seeking novel therapeutic targets may wish to locate putative enzymes that distinguish pathogens from their hosts, and that are expressed during appropriate developmental stages. When a researcher runs one of the approximately 100 searches available on the site, the search is presented as a first step in a strategy. The strategy is extended by running additional searches, which are combined with set operators (union, intersect or minus), or genomic interval operators (overlap, contains). A graphical display uses Venn diagrams to make the strategy’s flow obvious. The interface facilitates interactive adjustment of the component searches with changes propagating forward through the strategy. Users may save their strategies, creating protocols that can be shared with colleagues. The strategy system has now been deployed on all EuPathDB databases, and successfully deployed by other projects. The Strategies WDK uses a configurable MVC architecture that is compatible with most genomics and biological warehouse databases, and is available for download at
Database URL:
PMCID: PMC3122067  PMID: 21705364
24.  Chronic endoplasmic reticulum stress activates unfolded protein response in arterial endothelium in regions of susceptibility to atherosclerosis 
Circulation research  2009;105(5):453-461.
Endothelial function and dysfunction are central to the focal origin and regional development of atherosclerosis; however, an in vivo endothelial phenotypic footprint of susceptibility to atherosclerosis preceding pathological change remains elusive.
To conduct a comparative multi-site genomics study of arterial endothelial phenotype in athero-susceptible and athero-protected regions.
Methods and Results
Transcript profiles of freshly isolated endothelial cells from 7 discrete arterial regions in normal swine were analyzed to determine the steady state in vivo endothelial phenotypes in regions of varying susceptibilities to atherosclerosis. The most abundant common feature of the endothelium of all athero-susceptible regions was the upregulation of genes associated with endoplasmic reticulum (ER) stress. The unfolded protein response (UPR) pathway, induced by ER stress, was therefore investigated in detail in endothelium of the athero-susceptible aortic arch and was found to be partially activated. ER transmembrane signal transducers IRE1α and ATF6α and their downstream effectors, but not PERK, were activated concomitant with a higher transcript expression of protein folding enzymes and chaperones, indicative of ER stress in vivo.
The findings demonstrate the prevalence of chronic endothelial ER stress and activated UPR in vivo at athero-susceptible arterial sites. We propose that chronic localized biological stress is linked to spatial susceptibility of the endothelium to the initiation of atherosclerosis.
PMCID: PMC2746924  PMID: 19661457
hemodynamics; DNA microarrays; gene expression
25.  Annotare—a tool for annotating high-throughput biomedical investigations and resulting data 
Bioinformatics  2010;26(19):2470-2471.
Summary: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis.
Availability and Implementation: Annotare is available from under the terms of the open-source MIT License ( It has been tested on both Mac and Windows.
PMCID: PMC2944206  PMID: 20733062

Results 1-25 (50)