|Home | About | Journals | Submit | Contact Us | Français|
Ovarian Kaleidoscope database (OKdb) is an online, searchable, public database containing text-based and DNA microarray data to facilitate research by ovarian researchers. Using key words and predetermined categories, users can search ovarian gene information based on gene function, cell type of expression, cellular localization, hormonal regulation, mutant phenotypes, chromosomal location, ligand-receptor relationship, and other criteria, either alone or in combination. For individual genes, users can access more than 10 extensive DNA microarray datasets to interrogate gene expression patterns in a development-specific and cell type-specific manner. All ligand and receptor genes expressed in the ovary are matched to facilitate investigation of paracrine/autocrine signaling. More than 3500 ovarian genes in the database are matched to 185 gene pathways in the Kyoto Encyclopedia of Genes and Genomes to allow for elucidation of gene interactions and relationships. In addition to >400 genes with infertility or subfertility phenotypes when mutated in mice or humans, the OKdb also lists ~50 and ~40 genes associated with polycystic ovarian syndrome and primary ovarian insufficiency, respectively. The expanding OKdb is updated weekly and allows submission of new genes by ovarian researchers to allow instant access to DNA microarray datasets for newly submitted genes. The present database is a virtual community for ovarian researchers and allows users to instantaneously provide their comments for individual gene pages based on an automated Web-discussion system. In the coming years, we will continue to add new features to serve the ovarian research community.
More than 10 years ago, we set up an online searchable public database containing genes expressed in the ovary together with their ovarian expression pattern and function. The Ovarian Kaleidoscope database (OKdb; http://ovary.stanford.edu; Fig. 1) provides information regarding the biological function, expression pattern, and regulation of genes that are expressed in the ovary. In addition, it serves as a gateway to other online information resources relevant to the ovarian research community by offering PubMed (http://www.ncbi.nlm.nih.gov/pubmed/), Entrez gene (http://www.ncbi.nlm.nih.gov/gene), Online Mendelian Inheritance in Man (OMIM; http://www.ncbi.nlm.nih.gov/omim), Mouse Genome Informatics (MGI; http://www.informatics.jax.org/), Mammalian Reproductive Genetics (MRG; http://mrg.genetics.washington.edu/), and other links to access published literature, as well as nucleotide and amino acid sequences of genes. The OKdb also lists information on the chromosomal positions of ovarian genes, together with human and murine mutation phenotypes. Each ovarian gene entry is annotated manually by experts and weekly PubMed alerts (based on the key words “ovary,” “granulosa,” “luteal,” and “oocyte”) to allow for timely updates of all new publications into the database during the last decade. In January 2012, the total number of ovarian genes in the OKdb had reached ~3400 human genes, representing about 15% of the entire genome. There are an estimated 20–30 new gene pages being submitted per month. Although registration is not required for database access, there are >300 registered users. Based on the user monitor, there are 20–25 distinct users throughout the world accessing the database every day, with an accumulated >98000 worldwide visits since 2001.
During the last decade, we made major improvements by: 1) including a history section with Roger Short (University of Melbourne) and Anthony Zeleznik (University of Pittsburgh) contributing chapters on the ancient and modern history of ovarian research; 2) incorporating all paracrine polypeptide ligands and their cogent receptors in the ovary based on our Human Plasma Membrane Receptome database (http://receptome.stanford.edu/HPMR/); 3) establishing “KEGG” maps to allow users to survey all ovarian genes found in specific gene pathways annotated by the Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.genome.jp/kegg/), thus allowing for the elucidation of gene pathways in the ovary; 4) adding the “ovulatory gene pathways” section, authored by JoAnne Richards (Baylor College of Medicine), to depict genes involved in ovulation; and 5) publishing a minireview on transgenic mouse models by Jodi Flaws (University of Illinois at Urbana-Champaign) to highlight ovarian gene mutations and associated phenotypes.
We have set up an online instruction manual in the OKdb to provide a step-by-step instruction to users so that they can become familiar with key features of the database. We also published two papers dealing with the present ovarian gene database [1, 2]. Unsolicited by us, Science NetWatch twice featured the OKdb in 2000  and 2005  with the titles “Hot pick” and “DATABASE: Foundations of Fertility,” respectively. The OKdb was also highlighted as an editorial in the journal Endocrinology .
Published papers studying the ovary (search PubMed for ovary NOT CHO) number nearly 100000. It is difficult for individual investigators, especially for junior researchers, to sort out diverse genetic and physiological information on different ovarian genes to understand their localization, regulation, and functions. Although multiple biomedical databases have been established to facilitate the management of the rapidly expanding sphere of bioinformation, the existing databases have minimal information about tissue-specific expression patterns and the functional roles of genes. Google and PubMed queries are valuable to search specific publications based on key words, whereas Google Scholar (http://scholar.google.com/) provides highly cited publications. However, few text-based searches for genes important for diverse physiological processes or expression in specific cell types are available. Most of these databases, with the exception of OMIM, Entrez, and MGI, also lack judgment and annotation by experts in the field.
The OKdb provides genome-wide analyses on ovarian genes, including all transcription factors, secreted proteins, genes expressed in oocytes, genes that are important for fertility, and different combinations. Unlike Google Scholar and PubMed, all information is searchable not only by gene name, but also by criteria such as gene ontology , cellular functions (e.g., enzymes) and cellular localization (e.g., nuclear) of genes and gene product. For OKdb genes, the database also lists their expression in different ovarian cell types (e.g., granulosa cells), their association with ovarian processes (e.g., ovulation) and mutation phenotypes, as well as chromosome locations. Like viewing a kaleidoscope, users can search genes from different angles to obtain genome-wide views of ovarian genes, including all oocyte genes, all ovarian transcriptional factors, all cytoplasmic proteins, all genes regulated by luteinizing hormone (LH), etc. The availability of “combined search” options in the OKdb further allows the user to initially obtain a large number of query results (e.g., search “oocyte” to obtain oocyte-expressed genes), and then narrow down the search using additional criteria (e.g., search “oocyte” and “ligands” to get ligands expressed in oocytes). A “cell-specific” option further allows users to search for genes that are cell type specific. For genes expressed in two or more unique cell types, the user can check the “cell-specific” box in combination with any number of listed cell types. Users can also search using key words to find all gene pages with a particular term (e.g., collagen) or all gene pages containing references to a particular first author. On the individual gene page, users can find almost all publications published during the last decade and previously that deal with the gene of interest; these publications are sorted into different categories, such as cell type of expression, mutant phenotypes, ovarian processes, hormonal regulation, etc. Together with other related information, the OKdb allows for a comprehensive view of gene function and expression. By searching the continually expanding database, new patterns of ovarian gene expression, regulation, and function constantly emerge.
Genome-wide DNA microarray studies have become an important research tool to investigate the expression profiles of all genes. Facing the exponential increases of GenBank sequences and DNA microarray data in the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) at the National Center for Biotechnology Information (NCBI), biologists are being bombarded with massive amounts of data on gene expression patterns in diverse cell types at different physiological and pathological states. However, the major deficiencies of DNA microarray analyses are the inability to dynamically cross-reference with available knowledge bases and the difficulties involved with identifying useful expression data for individual genes in the massive datasets.
In addition to providing a platform for ovarian researchers to analyze text-based literature dealing with ovarian genes, the OKdb has also incorporated data derived from diverse large-scale DNA microarrays available from GEO at the NCBI to investigate global ovarian gene expression patterns using dynamic graphic plots. At the present time, the expression pattern of a given gene in more than 10 DNA microarray datasets is available. From each gene page, users can access DNA microarray data on the expression of the gene of interest at different developmental stages of the embryonic ovary , in oocytes from primordial to large antral follicles , during early stages of primordial follicle formation and activation , after gonadotropin stimulation of antral follicle maturation and after ovulation of preovulatory follicles , and during different stages of early embryonic development in mice and humans [11–13]. All of the expression patterns are plotted graphically for easy comparison. Furthermore, a link to Bio-GPS (http://biogps.gnf.org/) provides DNA microarray data on more than 100 tissues and cell types to reveal tissue-specific expression of a given gene.
The DNA microarray data in the OKdb provide comprehensive views of the ovarian transcriptome. For ovarian researchers interested in the expression of a given gene, these data provide the first step to evaluate the expression and potential functions of a gene of interest. The availability of several independent but overlapping DNA microarray sets from oocytes to early embryos further allows for nonbiased evaluation of gene expression patterns across different laboratory platforms. For example, based on a microarray experiment detailing changes in gene expression during oocyte development, one can see major increases in GDF9 expression in oocytes during the transition from primordial to primary follicles (Fig. 2, top left), followed by a plateau in oocytes from large antral follicles . In an extensive oocyte to early embryo microarray dataset , one can see decreases in GDF9 expression from preovulatory oocytes to early two-cell embryos, reaching minimal levels in four- to eight-cell embryos until the blastocyst stage (Fig. 2, top right). When checking the expression of c-fos in an extensive DNA microarray dealing with gonadotropin stimulation of follicle maturation and ovulation , one sees a minor follicle-stimulating hormone (FSH) induction of ovarian c-fos transcripts at 2 h after treatment of immature mice, and a major stimulation of c-fos expression at 1 h after treatment with an ovulatory dose of human chorionic gonadotropin (hCG) to induce ovulation (Fig. 2, bottom). For all of the microarray plots, placing the mouse cursor on an individual data point reveals specific stages and absolute values.
For the expression pattern of ovarian genes not listed in the OKdb, the user can submit new genes by simply registering in the database and submitting the Entrez Gene ID to automatically establish a new gene page. On the new gene page, gene names, symbols, synonyms, chromosomal locus, and links to OMIM and other databases as well as NCBI gene summaries are automatically retrieved. Once a new gene page is established, users can gain instant access to the available DNA microarray data for the gene of interest.
Because of the silencing of gene transcription in mature oocytes until zygotic genome activation of early embryos at two-cell (for mouse) and at four- to eight-cell (for human) stages [12, 14], genes expressed in mature oocytes play important roles during early embryonic development and during reprogramming of the sperm genome or genome of somatic cells following somatic cell nuclei transfer . Because the OKdb included several extensive DNA microarray datasets for gene expression from mature oocytes to different phases of early embryo development until the blastocyst stage, we have set up search criteria for “pluripotency” and “epigenetic regulation” to allow searches for genes important for reprogramming and stem cell derivation. Because of the wide interest in pluripotency genes shared by oocytes and early embryos, these features are valuable bioinformatic resources for early embryo and embryonic stem cell researchers.
In addition to accessing the literature on and expression patterns of individual genes, we have set up tools to analyze groups of related genes. We have linked all ovarian polypeptide ligand and receptor gene pages in the OKdb based on ligand-receptor pairs annotated in the Human Plasma Membrane Receptome [16, 17]. Coupled with DNA microarray datasets, this feature allows for the identification of paracrine mechanisms essential for different ovarian processes. For example, TrkB receptor is expressed in the oocyte based on “oocyte development” microarrays, whereas the ovarian expression of brain-derived neurotrophic factor (BDNF) was found to increase following the preovulatory LH surge in the “follicle maturation and ovulation” microarray. Coupled with the annotated ligand-receptor relationship between BDNF and TrkB, these surveys allow easy identification of BDNF as an intraovarian factor induced by the preovulatory LH surge to allow for subsequent investigation on BDNF regulation of nuclear and cytoplasmic maturation of the oocyte, essential for successful oocyte development into preimplantation embryos .
Multiple gene pathways (e.g., apoptosis) are used by diverse tissues and cell types, including the ovary, for similar or distinct functions during evolution. The KEGG has a collection of manually drawn pathway maps representing knowledge about diverse interaction networks of gene products. We have matched OKdb genes to 185 KEGG pathway maps. By clicking KEGG maps in the search page in the OKdb, users can see a list of OKdb genes in a given KEGG pathway. Following access to individual KEGG maps, users can focus on unique ovarian gene pathway maps. Because all genes in a given map with corresponding OKdb pages are labeled with a unique color, genes without published literature but present in well-characterized pathways can be easily identified for future analyses. For example, multiple genes in the “DNA replication” pathway have been studied in the ovary, but several key genes in the pathway remain to be investigated (red genes are found in the OKdb with literature support; black genes are not found and are without literature support; Fig. 3). Using this approach, one can survey literature for key ovarian genes and complete a given pathway by investigating the uncharacterized genes to gain a more comprehensive understanding of ovarian gene interactions in a given pathway.
We further take advantage of the availability of the human genome map for users to search for ovarian genes based on their chromosomal location in the human genome. A human chromosome map in the OKdb lists all ovary-expressed genes and their links to individual gene pages in the OKdb.
The present functional genomics paradigm facilitates understanding of ovarian genetics, physiology, and pathophysiology. The OKdb lists ~400 genes in which mutations can lead to infertility or subfertility phenotypes in human, murine, or other species. All of these genes are searchable for accessing the original literature. We have also included new search criteria for genes associated with the two major ovarian diseases in women: polycystic ovarian syndrome (PCO) and primary ovarian insufficiency (POI). Using a “sequential search” feature, users can search for these genes, and then sort them based on their cell expression pattern, physiological functions, ovarian processes, cellular distribution, and other criteria. For example, one can search “POI” genes, followed by “chromosome.” The results show that the human X chromosome contains the highest number of candidate genes for this disease.
Although massive genome-wide data have been obtained for ovarian diseases based on linkage, association, candidate gene, and other approaches [18, 19], it remains a great challenge to pull out inconspicuous “needles” (mutations, duplication, and single-nucleotide polymorphisms) from a veritable haystack of genome datasets. It is important to integrate known literature and gene expression data of ovarian genes into multiple “hot spots” in human chromosomes predicted by the genome-wide analyses, especially because many of these candidate sequences are located in the regulatory region of genes outside of protein coding sequences. We have linked our individual ovarian gene pages to the University of California Santa Cruz (UCSC) Genome Browser Web site (http://genome.ucsc.edu/cgi-bin/hgTracks?org=human). The UCSC Genome Browser site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides portals to the ENCODE project. With a single direct click to the Genome Browser, the user can explore sequences of individual OKdb genes, zoom in/out and scroll over chromosomes, and view the work of annotators worldwide. The Gene Sorter shows expression, homology, and other pieces of information on groups of genes that can be related in many ways. The Table Browser provides convenient access to the underlying database. VisiGene lets the users browse through a large collection of in situ images from mice and frogs to examine gene expression patterns. Genome Graphs allows one to upload and display genome-wide datasets.
Recently, we have acquired CIRCOS, an open-source, Perl-based software package for visualizing data and information . This infographic program allows for easy visualization of genomic data from patients with ovarian diseases for integration with text-based and DNA microarray-based data derived from animal models found in the OKdb using circular ideograms. CIRCOS is capable of displaying data as scatter, line, and histogram plots, heat maps, tiles, connectors, and text. As shown in Figure 4, left, all human chromosomes are listed in a ring format, together with information related to a particular genome location listed in different layers of the ring (Fig. 4, right). We are listing all OKdb genes in the periphery of the ring based on their location in the human chromosome. All OKdb genes with known mutations showing infertility or subfertility phenotypes will be highlighted. After beta testing, future plotting will highlight in different layers genomic data obtained from genome-wide association and linkage analyses of PCO and POI patients as well as perimenopausal women, together with genome hot spots based on single-nucleotide polymorphism, gene duplication, indel (insertions and deletions), and related parameters. This graphic approach could allow for easy identification of genomic hot spots to reveal molecular mechanisms underlying ovarian pathology and to elucidate the roles of ovarian genes involved in different physiological process.
We are continuing to improve the database for the easy dissemination of text-based and DNA microarray information to ovarian researchers. Next-generation sequencing technology is a powerful and cost-efficient tool for ultra-high-throughput genome and transcriptome analysis. For instance, high-throughput RNA sequencing (RNA-seq) can yield a comprehensive picture of the transcriptome, allowing for quantification of all genes and their isoforms across samples . Mature oocytes are transcriptionally silent and rely upon regulation of messenger RNA forms during the final stage of oocyte development for successful fertilization and early embryonic development. By analyzing the transcriptome at unprecedented levels of depth and accuracy, the RNA-seq approach could identify thousands of new transcript variants and isoforms in mammalian tissues or organs . Because of advances in the generation of mRNA-seq data from germ cells and early embryos , we anticipate the incorporation of the new information into the OKdb as new data sets become available.
Methylation is the only flexible genomic parameter that can change genome functions under exogenous influence. Hence, it constitutes the main and heretofore missing link between genetics, disease, and the environment that plays a decisive role in the etiology of virtually all human pathologies. Methylation occurs naturally on cytosine bases at CpG sequences, and differentially methylated cytosines give rise to distinct genome changes specific for oocytes, male germ cells, and early embryos to regulate gene expression. The Human Epigenome Project aims to identify, catalog, and interpret genome-wide DNA methylation patterns of all human genes in all major tissues. Based on the importance of maternally and paternally imprinted genes, we will incorporate Epigenome data into the present database as these data become available.
With rapid advances in social networking, the OKdb provides a virtual community for ovarian researchers. We have included an automated Web-discussion system (DiSQUS) on each gene page so that users can provide their own comments in real time and read other opinions on a particular gene of interest. To provide a graphic demonstration for OKdb usage, we also prepared an online tutorial video on YouTube (http://youtu.be/dknrc-YtZ9c). It is anticipated that the present ovarian database will continue to expand and provide valuable information for the community of ovarian physiologists during the coming decade.
We thank Dr. Louis DePaolo (Reproductive Sciences Branch in the Center for Population Research of the National Institute of Child Health and Human Development) for continuing support.
1Supported by funds from the National Institutes of Health/National Institute of Child Health and Human Development (U54 HD068158 as part of the Specialized Cooperative Centers Program in Reproduction and Infertility Research).