Search tips
Search criteria

Results 1-25 (27)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  PortEco: a resource for exploring bacterial biology through high-throughput data and analysis tools 
Nucleic Acids Research  2013;42(D1):D677-D684.
PortEco ( aims to collect, curate and provide data and analysis tools to support basic biological research in Escherichia coli (and eventually other bacterial systems). PortEco is implemented as a ‘virtual’ model organism database that provides a single unified interface to the user, while integrating information from a variety of sources. The main focus of PortEco is to enable broad use of the growing number of high-throughput experiments available for E. coli, and to leverage community annotation through the EcoliWiki and GONUTS systems. Currently, PortEco includes curated data from hundreds of genome-wide RNA expression studies, from high-throughput phenotyping of single-gene knockouts under hundreds of annotated conditions, from chromatin immunoprecipitation experiments for tens of different DNA-binding factors and from ribosome profiling experiments that yield insights into protein expression. Conditions have been annotated with a consistent vocabulary, and data have been consistently normalized to enable users to find, compare and interpret relevant experiments. PortEco includes tools for data analysis, including clustering, enrichment analysis and exploration via genome browsers. PortEco search and data analysis tools are extensively linked to the curated gene, metabolic pathway and regulation content at its sister site, EcoCyc.
PMCID: PMC3965092  PMID: 24285306
2.  Domain-Specific Data Sharing in Neuroscience: What do we have to learn from each other? 
Neuroinformatics  2008;6(2):117-121.
Molecular biology and genomics have made notable strides in the sharing of primary data and resources. In other domains of neuroscience research, however, there has been resistance to adopting formalized strategies for data exchange, archiving, and availability. In this article, we discuss how neuroscience domains might follow the lead of molecular biology on what has been successful and what has failed in active data sharing. This considers not only the technical challenges but also the sociological concerns in making it possible. Though, not a pain-free process, with increased data availability, scientists from multiple fields can enjoy greater opportunity for novel discoveries about the brain in health and disease.
PMCID: PMC3319052  PMID: 18473189
3.  Comparison of the Complete Protein Sets of Worm and Yeast: Orthology and Divergence 
Science (New York, N.Y.)  1998;282(5396):2022-2028.
Comparative analysis of predicted protein sequences encoded by the genomes of Caenorhabditis elegans and Saccharomyces cerevisiae suggests that most of the core biological functions are carried out by orthologous proteins (proteins of different species that can be traced back to a common ancestor) that occur in comparable numbers. The specialized processes of signal transduction and regulatory control that are unique to the multicellular worm appear to use novel proteins, many of which re-use conserved domains. Major expansion of the number of some of these domains seen in the worm may have contributed to the advent of multicellularity. The proteins conserved in yeast and worm are likely to have orthologs throughout eukaryotes; in contrast, the proteins unique to the worm may well define metazoans.
PMCID: PMC3057080  PMID: 9851918
4.  Genetic and physical maps of Saccharomyces cerevisiae 
Nature  1997;387(6632 Suppl):67-73.
Genetic and physical maps for the 16 chromosomes of Saccharomyces cerevisiae are presented. The genetic map is the result of 40 years of genetic analysis. The physical map was produced from the results of an international systematic sequencing effort. The data for the maps are accessible electronically from the Saccharomyces Genome Database (SGD:
PMCID: PMC3057085  PMID: 9169866
5.  Genome comparisons highlight similarity and diversity within the eukaryotic kingdoms 
In 2000, the number of completely sequenced eukaryotic genomes increased to four. The addition of Drosophila and Arabidopsis into this cohort permits additional insights into the processes that have shaped evolution. Analysis and comparisons of both completed genomes and partially sequenced genomes have already shed light on mechanisms such as gene duplication and gene loss that have long been hypothesized to be major forces in speciation. Indeed, duplicate gene pairs in Saccharomyces, Arabidopsis, Caenorhabditis and Drosophila are high: 30%, 60%, 48% and 40%, respectively. Evidence of horizontal gene-transfer, thought to be a major evolutionary force in bacteria, has been found in Arabidopsis. The release of the ‘first draft’ of the human genome sequence in 2000 heralds a new stage of biological study. Understanding the as-yet-unannotated human genome will be largely based on conclusions, techniques and tools developed during the analysis and comparison of the genome of these four model organisms.
PMCID: PMC3040119  PMID: 11166654
6.  Saccharomyces cerevisiae S288C genome annotation: a working hypothesis 
Yeast (Chichester, England)  2006;23(12):857-865.
The S. cerevisiae genome is the most well-characterized eukaryotic genome and one of the simplest in terms of identifying open reading frames (ORFs), yet its primary annotation has been updated continually in the decade since its initial release in 1996 (Goffeau et al., 1996). The Saccharomyces Genome Database (SGD; (Hirschman et al., 2006), the community-designated repository for this reference genome, strives to ensure that the S. cerevisiae annotation is as accurate and useful as possible. At SGD, the S. cerevisiae genome sequence and annotation are treated as a working hypothesis, which must be repeatedly tested and refined. In this paper, in celebration of the tenth anniversary of the completion of the S. cerevisiae genome sequence, we discuss the ways in which the S. cerevisiae sequence and annotation have changed, consider the multiple sources of experimental and comparative data on which these changes are based, and describe our methods for evaluating, incorporating and documenting these new data.
PMCID: PMC3040122  PMID: 17001629
S. cerevisiae; genome sequence; genome annotation; comparative genomics; exon/intron boundaries
7.  Expanding Yeast Knowledge Online 
Yeast (Chichester, England)  1998;14(16):1453-1469.
The completion of the Saccharomyces cerevisiae genome sequencing project11 and the continued development of improved technology for large-scale genome analysis have led to tremendous growth in the amount of new yeast genetics and molecular biology data. Efficient organization, presentation, and dissemination of this information are essential if researchers are to exploit this knowledge. In addition, the development of tools that provide efficient analysis of this information and link it with pertinent information from other systems is becoming increasingly important at a time when the complete genome sequences of other organisms are becoming available. The aim of this review is to familiarize biologists with the type of data resources currently available on the World Wide Web (WWW).
PMCID: PMC3037831  PMID: 9885151
World Wide Web; Saccharomyces Genome Database; Munich Information Center for Protein Sequences; Yeast Protein Database
8.  Gene Ontology: tool for the unification of biology 
Nature genetics  2000;25(1):25-29.
Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web ( are being constructed: biological process, molecular function and cellular component.
PMCID: PMC3037419  PMID: 10802651
9.  Annotare—a tool for annotating high-throughput biomedical investigations and resulting data 
Bioinformatics  2010;26(19):2470-2471.
Summary: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis.
Availability and Implementation: Annotare is available from under the terms of the open-source MIT License ( It has been tested on both Mac and Windows.
PMCID: PMC2944206  PMID: 20733062
10.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project 
Nature biotechnology  2008;26(8):889-896.
The Minimum Information for Biological and Biomedical Investigations (MIBBI) project provides a resource for those exploring the range of extant minimum information checklists and fosters coordinated development of such checklists.
PMCID: PMC2771753  PMID: 18688244
11.  Implementation of GenePattern within the Stanford Microarray Database 
Nucleic Acids Research  2008;37(Database issue):D898-D901.
Hundreds of researchers across the world use the Stanford Microarray Database (SMD; to store, annotate, view, analyze and share microarray data. In addition to providing registered users at Stanford access to their own data, SMD also provides access to public data, and tools with which to analyze those data, to any public user anywhere in the world. Previously, the addition of new microarray data analysis tools to SMD has been limited by available engineering resources, and in addition, the existing suite of tools did not provide a simple way to design, execute and share analysis pipelines, or to document such pipelines for the purposes of publication. To address this, we have incorporated the GenePattern software package directly into SMD, providing access to many new analysis tools, as well as a plug-in architecture that allows users to directly integrate and share additional tools through SMD. In this article, we describe our implementation of the GenePattern microarray analysis software package into the SMD code base. This extension is available with the SMD source code that is fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD with an enriched data analysis capability.
PMCID: PMC2686537  PMID: 18953035
12.  TB database: an integrated platform for tuberculosis research 
Nucleic Acids Research  2008;37(Database issue):D499-D508.
The effective control of tuberculosis (TB) has been thwarted by the need for prolonged, complex and potentially toxic drug regimens, by reliance on an inefficient vaccine and by the absence of biomarkers of clinical status. The promise of the genomics era for TB control is substantial, but has been hindered by the lack of a central repository that collects and integrates genomic and experimental data about this organism in a way that can be readily accessed and analyzed. The Tuberculosis Database (TBDB) is an integrated database providing access to TB genomic data and resources, relevant to the discovery and development of TB drugs, vaccines and biomarkers. The current release of TBDB houses genome sequence data and annotations for 28 different Mycobacterium tuberculosis strains and related bacteria. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives. TBDB currently hosts data for nearly 1500 public tuberculosis microarrays and 260 arrays for Streptomyces. In addition, TBDB provides access to a suite of comparative genomics and microarray analysis software. By bringing together M. tuberculosis genome annotation and gene-expression data with a suite of analysis tools, TBDB ( provides a unique discovery platform for TB research.
PMCID: PMC2686437  PMID: 18835847
13.  The XBabelPhish MAGE-ML and XML Translator 
BMC Bioinformatics  2008;9:28.
MAGE-ML has been promoted as a standard format for describing microarray experiments and the data they produce. Two characteristics of the MAGE-ML format compromise its use as a universal standard: First, MAGE-ML files are exceptionally large – too large to be easily read by most people, and often too large to be read by most software programs. Second, the MAGE-ML standard permits many ways of representing the same information. As a result, different producers of MAGE-ML create different documents describing the same experiment and its data. Recognizing all the variants is an unwieldy software engineering task, resulting in software packages that can read and process MAGE-ML from some, but not all producers. This Tower of MAGE-ML Babel bars the unencumbered exchange of microarray experiment descriptions couched in MAGE-ML.
We have developed XBabelPhish – an XQuery-based technology for translating one MAGE-ML variant into another. XBabelPhish's use is not restricted to translating MAGE-ML documents. It can transform XML files independent of their DTD, XML schema, or semantic content. Moreover, it is designed to work on very large (> 200 Mb.) files, which are common in the world of MAGE-ML.
XBabelPhish provides a way to inter-translate MAGE-ML variants for improved interchange of microarray experiment information. More generally, it can be used to transform most XML files, including very large ones that exceed the capacity of most XML tools.
PMCID: PMC2233607  PMID: 18205924
14.  The Stanford Tissue Microarray Database 
Nucleic Acids Research  2007;36(Database issue):D871-D877.
The Stanford Tissue Microarray Database (TMAD; is a public resource for disseminating annotated tissue images and associated expression data. Stanford University pathologists, researchers and their collaborators worldwide use TMAD for designing, viewing, scoring and analyzing their tissue microarrays. The use of tissue microarrays allows hundreds of human tissue cores to be simultaneously probed by antibodies to detect protein abundance (Immunohistochemistry; IHC), or by labeled nucleic acids (in situ hybridization; ISH) to detect transcript abundance. TMAD archives multi-wavelength fluorescence and bright-field images of tissue microarrays for scoring and analysis. As of July 2007, TMAD contained 205 161 images archiving 349 distinct probes on 1488 tissue microarray slides. Of these, 31 306 images for 68 probes on 125 slides have been released to the public. To date, 12 publications have been based on these raw public data. TMAD incorporates the NCI Thesaurus ontology for searching tissues in the cancer domain. Image processing researchers can extract images and scores for training and testing classification algorithms. The production server uses the Apache HTTP Server, Oracle Database and Perl application code. Source code is available to interested researchers under a no-cost license.
PMCID: PMC2238948  PMID: 17989087
15.  OntologyWidget – a reusable, embeddable widget for easily locating ontology terms 
BMC Bioinformatics  2007;8:338.
Biomedical ontologies are being widely used to annotate biological data in a computer-accessible, consistent and well-defined manner. However, due to their size and complexity, annotating data with appropriate terms from an ontology is often challenging for experts and non-experts alike, because there exist few tools that allow one to quickly find relevant ontology terms to easily populate a web form.
We have produced a tool, OntologyWidget, which allows users to rapidly search for and browse ontology terms. OntologyWidget can easily be embedded in other web-based applications. OntologyWidget is written using AJAX (Asynchronous JavaScript and XML) and has two related elements. The first is a dynamic auto-complete ontology search feature. As a user enters characters into the search box, the appropriate ontology is queried remotely for terms that match the typed-in text, and the query results populate a drop-down list with all potential matches. Upon selection of a term from the list, the user can locate this term within a generic and dynamic ontology browser, which comprises the second element of the tool. The ontology browser shows the paths from a selected term to the root as well as parent/child tree hierarchies. We have implemented web services at the Stanford Microarray Database (SMD), which provide the OntologyWidget with access to over 40 ontologies from the Open Biological Ontology (OBO) website [1]. Each ontology is updated weekly. Adopters of the OntologyWidget can either use SMD's web services, or elect to rely on their own. Deploying the OntologyWidget can be accomplished in three simple steps: (1) install Apache Tomcat [2] on one's web server, (2) download and install the OntologyWidget servlet stub that provides access to the SMD ontology web services, and (3) create an html (HyperText Markup Language) file that refers to the OntologyWidget using a simple, well-defined format.
We have developed OntologyWidget, an easy-to-use ontology search and display tool that can be used on any web page by creating a simple html description. OntologyWidget provides a rapid auto-complete search function paired with an interactive tree display. We have developed a web service layer that communicates between the web page interface and a database of ontology terms. We currently store 40 of the ontologies from the OBO website [1], as well as a several others. These ontologies are automatically updated on a weekly basis. OntologyWidget can be used in any web-based application to take advantage of the ontologies we provide via web services or any other ontology that is provided elsewhere in the correct format. The full source code for the JavaScript and description of the OntologyWidget is available from .
PMCID: PMC2080642  PMID: 17854506
16.  The Stanford Microarray Database: implementation of new analysis tools and open source release of software 
Nucleic Acids Research  2006;35(Database issue):D766-D770.
The Stanford Microarray Database (SMD; ) is a research tool and archive that allows hundreds of researchers worldwide to store, annotate, analyze and share data generated by microarray technology. SMD supports most major microarray platforms, and is MIAME-supportive and can export or import MAGE-ML. The primary mission of SMD is to be a research tool that supports researchers from the point of data generation to data publication and dissemination, but it also provides unrestricted access to analysis tools and public data from 300 publications. In addition to supporting ongoing research, SMD makes its source code fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD. In this article, we describe several data analysis tools implemented in SMD and we discuss features of our software release.
PMCID: PMC1781111  PMID: 17182626
17.  A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB 
BMC Bioinformatics  2006;7:489.
Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support.
We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion.
MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML.
PMCID: PMC1687205  PMID: 17087822
18.  Caryoscope: An Open Source Java application for viewing microarray data in a genomic context 
BMC Bioinformatics  2004;5:151.
Microarray-based comparative genome hybridization experiments generate data that can be mapped onto the genome. These data are interpreted more easily when represented graphically in a genomic context.
We have developed Caryoscope, which is an open source Java application for visualizing microarray data from array comparative genome hybridization experiments in a genomic context. Caryoscope can read General Feature Format files (GFF files), as well as comma- and tab-delimited files, that define the genomic positions of the microarray reporters for which data are obtained. The microarray data can be browsed using an interactive, zoomable interface, which helps users identify regions of chromosomal deletion or amplification. The graphical representation of the data can be exported in a number of graphic formats, including publication-quality formats such as PostScript.
Caryoscope is a useful tool that can aid in the visualization, exploration and interpretation of microarray data in a genomic context.
PMCID: PMC528725  PMID: 15488149
20.  The Stanford Microarray Database: data access and quality assessment tools 
Nucleic Acids Research  2003;31(1):94-96.
The Stanford Microarray Database (SMD; serves as a microarray research database for Stanford investigators and their collaborators. In addition, SMD functions as a resource for the entire scientific community, by making freely available all of its source code and providing full public access to data published by SMD users, along with many tools to explore and analyze those data. SMD currently provides public access to data from 3500 microarrays, including data from 85 publications, and this total is increasing rapidly. In this article, we describe some of SMD's newer tools for accessing public data, assessing data quality and for data analysis.
PMCID: PMC165525  PMID: 12519956
21.  Design and implementation of microarray gene expression markup language (MAGE-ML) 
Genome Biology  2002;3(9):research0046.1-research0046.9.
Meaningful exchange of microarray data is currently difficult because it is rare that published data provide sufficient information depth or are even in the same format from one publication to another. MAGE will help microarray data producers and users to exchange information by providing a common platform for data exchange, and MAGE-STK will make the adoption of MAGE easier.
Meaningful exchange of microarray data is currently difficult because it is rare that published data provide sufficient information depth or are even in the same format from one publication to another. Only when data can be easily exchanged will the entire biological community be able to derive the full benefit from such microarray studies.
To this end we have developed three key ingredients towards standardizing the storage and exchange of microarray data. First, we have created a minimal information for the annotation of a microarray experiment (MIAME)-compliant conceptualization of microarray experiments modeled using the unified modeling language (UML) named MAGE-OM (microarray gene expression object model). Second, we have translated MAGE-OM into an XML-based data format, MAGE-ML, to facilitate the exchange of data. Third, some of us are now using MAGE (or its progenitors) in data production settings. Finally, we have developed a freely available software tool kit (MAGE-STK) that eases the integration of MAGE-ML into end users' systems.
MAGE will help microarray data producers and users to exchange information by providing a common platform for data exchange, and MAGE-STK will make the adoption of MAGE easier.
PMCID: PMC126871  PMID: 12225585
22.  Identification of Genes Periodically Expressed in the Human Cell Cycle and Their Expression in TumorsD⃞ 
Molecular Biology of the Cell  2002;13(6):1977-2000.
The genome-wide program of gene expression during the cell division cycle in a human cancer cell line (HeLa) was characterized using cDNA microarrays. Transcripts of >850 genes showed periodic variation during the cell cycle. Hierarchical clustering of the expression patterns revealed coexpressed groups of previously well-characterized genes involved in essential cell cycle processes such as DNA replication, chromosome segregation, and cell adhesion along with genes of uncharacterized function. Most of the genes whose expression had previously been reported to correlate with the proliferative state of tumors were found herein also to be periodically expressed during the HeLa cell cycle. However, some of the genes periodically expressed in the HeLa cell cycle do not have a consistent correlation with tumor proliferation. Cell cycle-regulated transcripts of genes involved in fundamental processes such as DNA replication and chromosome segregation seem to be more highly expressed in proliferative tumors simply because they contain more cycling cells. The data in this report provide a comprehensive catalog of cell cycle regulated genes that can serve as a starting point for functional discovery. The full dataset is available at
PMCID: PMC117619  PMID: 12058064
23.  Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO) 
Nucleic Acids Research  2002;30(1):69-72.
The Saccharomyces Genome Database (SGD) resources, ranging from genetic and physical maps to genome-wide analysis tools, reflect the scientific progress in identifying genes and their functions over the last decade. As emphasis shifts from identification of the genes to identification of the role of their gene products in the cell, SGD seeks to provide its users with annotations that will allow relationships to be made between gene products, both within Saccharomyces cerevisiae and across species. To this end, SGD is annotating genes to the Gene Ontology (GO), a structured representation of biological knowledge that can be shared across species. The GO consists of three separate ontologies describing molecular function, biological process and cellular component. The goal is to use published information to associate each characterized S.cerevisiae gene product with one or more GO terms from each of the three ontologies. To be useful, this must be done in a manner that allows accurate associations based on experimental evidence, modifications to GO when necessary, and careful documentation of the annotations through evidence codes for given citations. Reaching this goal is an ongoing process at SGD. For information on the current progress of GO annotations at SGD and other participating databases, as well as a description of each of the three ontologies, please visit the GO Consortium page at SGD gene associations to GO can be found by visiting our site at
PMCID: PMC99086  PMID: 11752257
24.  The Stanford Microarray Database 
Nucleic Acids Research  2001;29(1):152-155.
The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77–80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73–76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10–14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332–333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45–48] and can be accessed at
PMCID: PMC29818  PMID: 11125075
25.  Saccharomyces Genome Database provides tools to survey gene expression and functional analysis data 
Nucleic Acids Research  2001;29(1):80-81.
Upon the completion of the Saccharomyces cerevisiae genomic sequence in 1996 [Goffeau,A. et al. (1997) Nature, 387, 5], several creative and ambitious projects have been initiated to explore the functions of gene products or gene expression on a genome-wide scale. To help researchers take advantage of these projects, the Saccharomyces Genome Database (SGD) has created two new tools, Function Junction and Expression Connection. Together, the tools form a central resource for querying multiple large-scale analysis projects for data about individual genes. Function Junction provides information from diverse projects that shed light on the role a gene product plays in the cell, while Expression Connection delivers information produced by the ever-increasing number of microarray projects. WWW access to SGD is available at
PMCID: PMC29796  PMID: 11125055

Results 1-25 (27)