PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of databaseAlertsAuthor InstructionsSubmitAboutDatabase
 
Database (Oxford). 2009; 2009: bap022.
Published online 2009 December 7. doi:  10.1093/database/bap022
PMCID: PMC2860946

Cildb: a knowledgebase for centrosomes and cilia

Abstract

Ciliopathies, pleiotropic diseases provoked by defects in the structure or function of cilia or flagella, reflect the multiple roles of cilia during development, in stem cells, in somatic organs and germ cells. High throughput studies have revealed several hundred proteins that are involved in the composition, function or biogenesis of cilia. The corresponding genes are potential candidates for orphan ciliopathies. To study ciliary genes, model organisms are used in which particular questions on motility, sensory or developmental functions can be approached by genetics. In the course of high throughput studies of cilia in Paramecium tetraurelia, we were confronted with the problem of comparing our results with those obtained in other model organisms. We therefore developed a novel knowledgebase, Cildb, that integrates ciliary data from heterogeneous sources. Cildb links orthology relationships among 18 species to high throughput ciliary studies, and to OMIM data on human hereditary diseases. The web interface of Cildb comprises three tools, BioMart for complex queries, BLAST for sequence homology searches and GBrowse for browsing the human genome in relation to OMIM information for human diseases. Cildb can be used for interspecies comparisons, building candidate ciliary proteomes in any species, or identifying candidate ciliopathy genes.

Database URL: http://cildb.cgm.cnrs-gif.fr

Introduction

Ciliopathies represent a class of genetic diseases attributed to dysfunction of centrioles and their associated structures and/or derivatives, centrosomes and cilia (1–3). Indeed, the centriole, a barrel-shaped cylinder of triplets of microtubules, fulfills a wide range of functions that can be classified into two broad categories. First, when in the cytoplasm, a complex matrix of proteins is assembled around the centrioles, thus forming the centrosome, a microtubule organizing platform directly involved in cell shape, cell polarity and cell division. Second, when anchored in the plasma membrane, the centriole behaves as a basal body and nucleates an axoneme, which, depending on species and/or cell type, can be the backbone of flagella or cilia, be they motile or immotile, sensory or primary cilia (4–6). Recent high throughput studies revealed that these structures are composed of hundreds of proteins. Presently known ciliopathies such as Kartagener, Bardet-Biedl, Meckel-Grüber, Alstrom, Joubert syndromes, polycystic kidney disease, originate from ciliary dysfunction during embryonic development and in adult organs, and provoke a combination of multiple symptoms in patients, such as polydactyly, obesity, sterility, mental retardation, kidney polycystosis, deafness, retinal defects, ciliary dyskinesia, sinusitis, otitis, bronchiectasis (7). From the high complexity in protein composition of centriole/basal bodies and cilia/flagella, we can anticipate a growing number of known and orphan diseases to be identified as ciliopathies.

Cilia can have motile functions and are involved in fluid movement (e.g. mucous, cerebral-spinal fluid). They can also have sensory functions (e.g. in olfactory neurons, photoreceptors). Primary cilia play prominent roles in development (8) and most likely also in tissue maintenance and regeneration since they are present on stem cells (9). The ciliary axoneme is a cytoskeletal structure highly conserved through evolution, formed by a cylinder of nine doublets of microtubules plus, in most cases, a central pair of microtubules (9 + 2 pattern), enveloped by an extension of the plasma membrane. The ciliary membrane contains ion channels, receptors and other signaling proteins that control axoneme bending for motility or that sense chemical or mechanical stimuli and transduce signals internally (6). Not only is the ultrastructure conserved, but also the core protein composition, so that most protein functions can be extrapolated from one species to another, hence the development of model systems for ciliary function, such as Caenorhabditis, Chlamydomonas, Drosophila, Paramecium, Tetrahymena and Trypanosoma.

Our laboratory, taking advantage of the Paramecium genome sequence (10) and the ParameciumDB model organism database (11), focuses on motile and sensory aspects of ciliary function in Paramecium. Advantages of the model include rapid and efficient RNAi and easy phenotypic description of swimming behavior that relies on ciliary function. We performed high throughput analyses on Paramecium cilia: a proteomics study of isolated cilia (see Supplemental Data) and a study of transcriptome changes during ciliary biogenesis (Arnaiz et al., in preparation). We were confronted with the problem of comparing our data with previous studies of centrioles and cilia in other organisms. This prompted us to build Cildb (http://cildb.cgm.cnrs-gif.fr), a knowledgebase that relates whole proteomes of 18 species through orthology and links the relevant proteins to ciliary studies as well as to the OMIM database of human genetic diseases. Cildb was designed for finding information on cilia and ciliopathies, as illustrated here. In addition, as it contains the whole proteome of each species, Cildb has wider applications such as linking any genetic disease to model organisms.

Objectives and specifications

Three databases related to centrosome, basal bodies and cilia/flagella are currently available, the Centrosomedb [(12), http://centrosome.dacya.ucm.es/], the Ciliome Database [(13), http://www.sfu.ca/~leroux/ciliome_database.htm] and the Ciliaproteome [(14), http://v3.ciliaproteome.org/cgi-bin/index.php]. However, none of these databases met our needs since the genome data on Paramecium is not incorporated, no entry to these databases is possible using sequence information, the raw data from each original study is not available, and the orthology relationships between species were calculated only by BLAST best reciprocal hits, which masks multigene families. Altogether, this led us to design a completely new tool to browse the complex data emanating from very different approaches (proteomics, transcriptomics, comparative genomics, search for promoters) in different model organisms.

A database dedicated to studies of an organelle, such as centriole or cilium, must be open to future discoveries and identification of novel proteins not yet found in present studies. It is thus necessary for the the whole proteome of each species to be present in the database. Comparing studies made in different species implies that orthology relationships have to be calculated between species, using an algorithm that takes account of multigene families (a particularly important consideration for the Paramecium genome, but extant in all genomes). In addition, it seemed worthwhile to allow queries using any species as the entry proteome. This implies that orthology calculations have to be made for each pairwise combination of species in the database. To link the studies to the proteomes, the best would be to incorporate raw data and use them to assign confidence stringency values to the ciliary proteins identified in queries. It also seemed useful to include data from the OMIM database, not only data that identifies genes responsible for human diseases, but also information on diseases imprecisely mapped on the genome and that have many candidate genes in the disease interval. Finally, it seemed that access to the data in Cildb should be versatile and include not only complex queries but also sequence homology searches and a human genome browser that incorporates tracks with ciliary data and OMIM information for navigation along chromosomal regions.

Data sources and computational analyses

Species whose whole proteome is included in Cildb

The present Cildb version, which will be updated at regular intervals in the future, contains the complete set of predicted proteins from the genomes of 18 species, nine of them chosen because high throughput studies on cilia, flagella or centrosomes are available (Caenorhabditis elegans, Chlamydomonas reinhardtii, Drosophila melanogaster, Homo sapiens, Mus musculus, Paramecium tetraurelia, Rattus norvegicus, Trypanosoma brucei, Tetrahymena thermophila), four of them because the organisms are good models for ciliary experiments although no high throughput study is yet published (Ciona intestinalis, Danio rerio, Giardia lamblia, Plasmodium falciparum) and five of them because they lack cilia and centrioles (Arabidopsis thaliana, Dictyostelium discoideum, Escherichia coli, Saccharomyces cerevisiae, Schizosaccharomyces pombe) and can be used in genomic subtractive studies. Sequence information was retrieved either from the EnsEMBL portal when available, or from the database dedicated to the organism (Supplementary Table S1).

Ciliary studies included in Cildb

The ciliary studies incorporated in this version of Cildb are proteomics of centrosome, basal bodies and cilia/flagella of Chalmydomonas, Homo, Rattus, Paramecium, Tetrahymena and Trypanosoma, trancriptome analyses related to the presence of cilia in certain tissues or to ciliary biogenesis in Caenorhabditis, Chlamydomonas and Paramecium, comparative genomics between Homo, Chlamydomonas and Arabidopsis, and search for motifs in promoters in Caenorhabditis and Drosophila (Figure 1, Supplementary Table S1).

Figure 1.
Orthology calculations and links to ciliary studies underlying Cildb. The 18 species analyzed are represented as circle arcs proportional to the size of the proteome, in red when ciliary studies exist (delineated as ticks outside the circle), in orange ...

Determination of orthology relationships

Since whole proteomes may contain splicing variants as well as sets of similar paralogs in multigene families, the way to establish orthology is not straightforward: proteins often cannot be involved in binary orthology relationships identified by best reciprocal BLASTp hits. The Inparanoid program (15) overcomes this problem by relating groups of proteins in one species to groups of proteins in another one, provided that the intraspecific distances between paralogs are shorter than the interspecific distances between the putative orthologs. We thus calculated orthology between the predicted proteins of the 18 genomes cited above using Inparanoid on the output of the pairwise BLASTp comparison between the 18 proteomes, including self-comparisons, 324 comparisons altogether (Figure 1), with the default Inparanoid parameters for eukaryotes.

The stringency of Inparanoid is such that, in case of poor gene annotation (as currently occurs in genome projects with gene truncation, false splicing pattern, gene fusion or splitting), some orthology relationships are missed, leading to an excess of false negatives when given proteins are searched for through orthology in another species. Hence, we added to this calculation the result of analysis of alignments using empirical filters that were manually validated. We carried out Smith–Waterman alignments of all pairs of best hits by BLASTp and added to the Inparanoid results the alignments with at least 30% identity on at least 52% of the length of both proteins, or when the product of both parameters was above 2300 (tolerating better alignments over shorter lengths or the converse).

Determination of homologs for genome subtraction

For some purposes, both of the above methods are too stringent: if they allow recognition of possible orthologs in two proteomes, they cannot prove that a given protein has no ortholog in a species. Comparative genomics often relies on the presence or absence of sets of orthologs in several species. In such cases, less stringent comparison tools are used and a cutoff of 1e − 10 (16) or even higher (17) is employed. We thus also performed a third, low stringency, calculation with a simple cutoff on the BLAST score. Indeed, the BLAST score is much more consistent than the e-value when comparisons are made between proteomes of very different size. The threshold for homolog detection was empirically fixed to ≥70, a score value generally corresponding to an e-value around 1e−10.

Remapping ciliary studies

To identify centrosome, centriole, basal body and flagellar/ciliary proteins, we remapped all studies published to date on the whole proteomes of the corresponding species. This was performed in two steps, retrieving the protein sequences from the original studies and determining the correspondence of these proteins with the recent proteome versions used for orthology calculations.

We started with 21 studies published in 17 articles, including this one, and concerning nine species altogether (Figure 1; Supplementary Table S1). In most of the articles, supplemental tables give the list of protein accession numbers, which permitted retrieval of the sequences (except for a small minority for which the accession number did not correspond to anything). Two studies provided a list of peptides obtained in proteomics, instead of (18) or in addition to (19) a list of protein accession numbers. In such cases, we mapped the peptides to the present version of the proteomes and retrieved the corresponding proteins. Each protein recovered from the ciliary studies has been flagged with attributes corresponding to the raw results in the studies when available (number of peptides in proteomics, fold change or false discovery rate in transcriptome analyses, score and distance from ATG of X-boxes, e-values in comparative genomics). Altogether, 16 038 protein-study entries were recovered (Supplementary Table S1).

When the genome version used in any given study was different from the version we used for orthology determination, we remapped the proteins to the version used for orthology determination using BLASTp. Indeed, from version to version, the genome annotation evolves, some gene models appear while others disappear, and the structure of some others is modified leading to significant changes in the corresponding protein sequence. We considered a protein as remapped if there was at least 90% identity on 90% of the length of each protein. The remaining proteins were validated or rejected by visual examination of the alignments. In this process, 288 entries were dropped because of no hit in the new genome versions, and 243 rejected by human curation. The 15 443 remaining proteins correspond to 19 051 entries in the new genome versions (the discrepancy between the numbers arises from different treatment of alternative splicing, of paralog families, etc. in the various genome versions). The vast majority of the entries (17 348) were automatically remapped, while 1703 of them were recovered by human curation (Supplementary Table S1).

In addition to the protein ‘flagging’ with ciliary studies, we linked all proteins of the proteomes to general attributes predicted from the amino acid sequence such as molecular weight, isoelectric point, presence of signal peptides (20) and number of transmembrane helices (21).

OMIM data integration

The OMIM database (http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim) gathers all information about genetic inheritance in man, linking genetic diseases to their corresponding genes when they are known. In addition, genetic diseases still imprecisely localized to one or several chromosome bands are also included in the OMIM database, so that candidate genes encompassed by the chromosomal region can be extracted. In Cildb, we used this information in two different ways. First, we flagged human proteins with the OMIM entry when the direct link exists. The OMIM information is then treated like the other properties. Second, when OMIM entries correspond to several genes, we incorporate the information into a distinct database. In this section of Cildb, there are however as many entries as pairs of candidate protein-disease (530 765 altogether, a number much greater than both 46 591, the number of human proteins, and 11 152, the number of OMIM entries referenced in Cildb). Searches can be done to reveal all human genes present within the genetic region where the disease has been genetically localized, and to display them with any desired attribute, including orthology and occurrence in ciliary studies.

Cildb ARCHITECTURE

The Cildb web site is organized around three main tools, BioMart, NCBI BLAST and GBrowse (Figure 2). Each tool provides its own interface and its own storage system.

Figure 2.
Schema of the structure and possibilities of use of Cildb. The orthology calculations and links to ciliary studies and to OMIM are at the center of Cildb. To access the data, three possibilities of queries are offered: BioMart query using key words or ...

BioMart is a data management system that provides a powerful complex web query interface (22). Programmatic execution of queries is also available via a web-services API, or direct-access software libraries written in Perl. The data are split into three databases (PostgreSQL) corresponding to the three levels of confidence of the orthology. The databases contain 18 datasets (or marts) according to the 18 species. A dataset is a collection of several tables, which follow a BioMart naming convention (‘dataset__content__type’).

The NCBI BLAST tool (regular, PSI or PHI BLAST) allows homology searches starting from a protein or DNA sequence (23). We have indexed the proteome of each species separately so that it is possible to query any given proteome or all of the proteomes in Cildb taken together.

The Generic Genome Browser (GBrowse version 1.69), the most popular viewer in the GMOD project (24), composed of a web interface (CGI in Perl) and a BioPerl database (Bio::DB::SeqFeature::Store in MySQL), has been implemented in Cildb for the human genome. We generate GFF3 files (Generic File Format version 3) corresponding to each track: human proteins, Inparanoid orthologs and OMIM entries. The OMIM track uses the glyph ‘wave’ (Bio::Graphics::Glyph::wave).

In addition, Cildb contains protein pages, which gather all the information stored in Cildb for any protein. These pages are built using a Model View Controller (MVC) system implemented in Perl with the Template::Toolkit module (25).

Cildb UPDATES

Updates of Cildb will be performed periodically after curation of the literature to incorporate new ciliary studies and new proteomes from the corresponding species. We also plan to incorporate new proteomes of species whose phylogenetic position is of interest, whether or not ciliary studies are available. Genomes already in Cildb will also be updated periodically. This means that Inparanoid calculations must be carried out for major genome releases with concomitant remapping of ciliary studies to the proteomes. This procedure is CPU-intensive and also requires human curation, so that we plan to make such updates every 18 months. The next version of Cildb (V2.0) is in early steps of preparation.

User Interface

Cildb can be entered from any of its 18 species and gives access to properties identified in any genome through orthology relationships, whatever the species users are interested in. The three tools described above, BioMart, BLAST and GBrowse are available. We will present the use of these tools in Cildb.

A complex BioMart query is decomposed into four steps: choice of the dataset, filtering of the dataset to select only the proteins with the desired properties, choice of the attributes to be displayed with the output, which will usually be different from the properties used for the query and data retrieval. For simple key word queries, we added a quick search box accessible from every Cildb page.

Datasets: To choose the data set, two operations are needed. First, the user has to choose the orthology/homology calculation method, Inparanoid, Inparanoid plus filtered best hits, Filtered BLASTp, according to the desired use of the database. Then the reference species, among the 18 that are listed has to be chosen.

Filters: Filtration is made using the properties and makes it possible to retrieve only proteins with the desired properties. On the BioMart page (Figure 3), they are grouped in six categories.

  • – The general filter is used to look for proteins according to general properties, ID number, synonyms, key words in the reference species or in linked orthology groups, molecular weight, isoelectric pH, etc.
  • – The orthology filter permits the user to select proteins with (or without) orthologs in any combination of species.
  • – There are three ciliary filters to look for proteins identified in ciliary studies or whose orthologs are identified in ciliary studies. The three filters are different in that the advanced ciliary study filter examines raw results of ciliary studies, the ciliary study filter (all) examines ciliary results according to pre-calculated stringency with multiple selection using the Boolean operator ‘AND’, while the ciliary study filter (any) uses the Boolean operator ‘OR’.
  • – The OMIM filter permits selection of proteins whose human orthologs are indexed in OMIM.

Attributes: The choice of attributes determines the properties that are displayed for each protein retrieved by the query. The attributes are organized by species. In the reference species (the one chosen as dataset at the beginning of the query, listed on the first line), numerous fields are found: protein ID, synonyms, description, molecular weight, isoelectric pH, presence of transmembrane helices, signal peptides, etc. Since we linked human proteins to the OMIM entries for human genetic disease, if a protein of a given species has a human ortholog referenced in OMIM, this can be displayed as an attribute in the output of the query. When ciliary studies have been conducted in this species, the detailed raw results of the studies are given for each relevant protein. The attributes concerning other species reflect the presence or not of orthologs (according to the method originally selected in the dataset) and, when they exist, whether they have been found in ciliary studies in this species. Detailed results of ciliary studies are not provided for orthologs since a given protein may have several orthologs (i.e. in-paralogs in the Inparanoid family), generating different sets of raw results that cannot be displayed in the BioMart interface. For simpler searching and posting, we also classified the raw results of ciliary studies as low, medium and high stringency, described in detail in Supplementary Table S1. For example, in proteomic analyses, low stringency means identification of a protein by one peptide detected by mass spectrometry, medium stringency by at least two peptides, high stringency by at least four peptides.

Figure 3.
Screenshot of a typical query page of Cildb, here using the Homo sapiens whole proteome as a dataset and displaying the categories of filters that can be used for the query.

Whenever sequence information is needed for the output of a query, the ‘sequence’ button can be selected from the attribute page.

Data retrieval: The ‘count’ button displays the number of proteins that pass the filter out of the total number of proteins in the proteome. The ‘results’ button gives access to the list of matching proteins, displayed with all selected attributes. Navigation to the filter or attribute pages makes it easy to refine the search or the display. The results may be exported as text or xls files (or in fasta format for sequences) for further analyses. Cildb results as well as xls exports contain internal links to protein pages in which a summary of all Cildb information on the protein is displayed, as well as internal and external links (Figure 2).

The second way to enter Cildb is via the BLAST tool. This allows the user to retrieve proteins from Cildb, either in the whole database or for a given species, by sequence alignment using the NCBI BLAST algorithm. When performed on a single organism, the Cildb BLAST output provides, in addition to classical alignment output, links to protein pages of Cildb as well as to BioMart views of target proteins, to be filtered by other criteria or displayed with Cildb attributes.

Finally, Cildb can enter the human genome through GBrowse. In addition to tracks to visualize genes and encoded proteins, browsing the human genome in Cildb displays tracks for two kinds of OMIM data, the OMIM description of the corresponding gene and associated diseases if they exist and the overall localizations on the chromosomes of OMIM entries not precisely allocated to a gene, but rather to a chromosomal region. Whether a given protein has been identified through ciliary studies (either directly or through one of its orthologs), or has Inparanoid orthologs, is indicated as a track in the browser.

Use CASES

The applications of Cildb are limited only by the needs and the imagination of the user. First of all, just listing proteins found in particular studies with many sorts of attributes is already a powerful improvement in the field. However, the major innovation of this database consists in allowing all kinds of experiments to look for particular proteins with defined properties in any combination, be they biophysical, from descriptors, orthology relationships, ciliary studies, or in relation to OMIM information.

Comparison of Ciliary proteomics in Paramecium to other ciliary studies

Our work on ciliary proteomics in Paramecium, presented in the Supplementary Data (Supplementary Figure S1, Supplementary Tables S2 and S3), was evaluated by comparison to other ciliary studies performed formerly in other species. This evaluation was performed using the BioMart query tool of Cildb (Figure 4). The figure clearly shows that: (i) in each species, approximately half of the ciliary proteins identified through ciliary proteomics possess orthologs in the other species. Only Paramecium and Tetrahymena, closer in evolution than the other species, share more orthologs. (ii) Our Paramecium ciliary proteomics is more specific than the other ones in the comparison, although less sensitive than the one of Pazour and colleagues (26) in Chlamydomonas. Indeed, no ribosomal or histone contaminants were detected in Paramecium cilia preparations. On the other hand, starting from the ciliary proteome in Paramecium, Tetrahymena or Trypanosoma, a high proportion of the corresponding Chlamydomonas orthologs were already identified by the Chlamydomonas study (26) (dark green in the histograms), whereas the orthologs of Chlamydomonas ciliary proteins identified in ref. (26) were identified in the ciliary proteomics in each other species at a maximal rate of 50%. Pairwise comparisons of ciliary proteomic studies show that our Paramecium ciliary study ranks just after the study in Chlamydomonas in terms of sensitivity, probably owing to the fact that only whole purified cilia were analyzed in Paramecium, in contrast to Chlamydomonas where sub-fractions of cilia were also analyzed by mass spectrometry.

Figure 4.
Comparison of ciliary proteomic studies from different unicellular models. Using Cildb, we compared the proteomic studies of purified cilia/flagella of Chalmydomonas, Paramecium, Tetrahymena and Trypanosoma. The protocol was the same for each study: (i) ...

Building a new ciliary proteome

Using Cildb, one can identify proteins likely to be constituents of cilia even in species devoid of any specific ciliary study, e.g. Danio rerio, Giardia lamblia or Ciona intestinalis, just by looking in this species for proteins having orthologs in other species that have been identified as ciliary proteins. Supplementary Table S4 gives an example of a Cildb output in xls format, in this case a list of the 975 proteins of Danio rerio with orthologs in at least three different species identified as a ciliary protein with medium confidence (see definition in Supplementary Table S1).

Real-time comparative genomics

Comparative genomics provided a powerful strategy for the identification of centriole/cilia proteins by considering all Chlamydomonas reinhardtii proteins having an ortholog in Homo sapiens but not in Arabidopsis thaliana (subtraction of ‘non-ciliary genomes’ from the common protein complement between two ‘ciliary genomes’) (16). Similar experiments can now be done online using Cildb, with any combination of species. For that purpose, we provide homology based on a BLAST score cutoff of 70 (see Experimental Procedures section in Supplementary Data), as did the above-mentioned study with a 1e−10 cutoff (16), rather than on Inparanoid orthology, to avoid an abundance of false negatives. The exact reproduction of the experiment reported in ref. (16), using Cildb is presented in Supplementary Tables S5a and S5b. The interest here is that the in silico experiment using Cildb can be validated by ‘bench’ experiments compiled in the database, just by looking at the experimental ciliary attributes of the identified proteins. Despite good overall concordance, differences appear between the original experiment and the Cildb screen, but careful examination reveals that all differences arise from the evolution of the annotation between successive genome versions: genes found with the Cildb screen but not by Li et al. (16), correspond to gene models present in version 3 but not in version 2 of the Chlamydomonas genome; conversely, genes not found with Cildb orthology but identified by Li et al. (16), all correspond to gene models present in the recent version of the Arabidopsis genome but not in the former one (so that these ‘non-ciliary’ proteins could not be subtracted from the data set at that time). Thus, the procedure used in Cildb online supports comparative genomics experiments using any combination of the 18 species currently available.

Identification of ciliopathy genes

From global analyses that can be mined in human or model organisms using Cildb, at least a thousand proteins are likely to be components of, or involved in the biogenesis of centrioles, basal bodies, cilia and flagella. Dysfunction of some of these proteins leads to severe diseases, the ciliopathies. Recent reviews (27,28) listed known ciliopathies and a few more can be retrieved from the literature: CILD6 due to mutations in TXNDC3 (29), CILD9 to mutations in DNAI2 (30), CILD10 to mutations in KTU (31), and CIL11 and CIL12 to mutations in RSPH4a and RSPH9, respectively (32). Altogether, 50 human genes are involved in ciliopathies when they are mutated. Eight of these cannot yet be retrieved from Cildb, because the version of OMIM incorporated in Cildb does not yet display the links to these disorders. To assess the interest of Cildb to find ciliopathies, we performed a BioMart query starting successively from H. sapiens, C. reinhardtii and P. tetraurelia, filtering proteins for their links to a human disorder (2358 entries in the MORBID section of OMIM) and requiring at least three ciliary studies with a medium confidence stringency (Figure 5). Proteins linked to 216 disorders appear in the filtered output, including 20 of the 42 indexed ciliopathies. This represents a ciliopathy enrichment by a factor of 5 (compare 20 out of 216 with 42 out of 2358). In addition, the ciliopathy CIL6, not listed in the reviews, appears in this search. This may indicate that many of the 216 disorders identified in this query may be novel ciliopathy-related diseases. The 22 ciliopathies not found may be explained by the fact that not all ciliary proteins are identified in high throughput studies.

Figure 5.
Finding novel ciliopathy candidate genes with Cildb. Starting successively with the proteomes of three species, Paramecium tetraurelia, Homo sapiens and Chlamydomonas reinhardtii, we applied the same BioMart filters, ‘at least 3 ciliary evidences ...

We wondered whether some of the 216 disorders in the filtered list could be ciliopathies and, after examination of the types of symptoms in the disease description and the kinds of ciliary evidences, 11 of these diseases can be proposed as candidate novel ciliopathies: four retinitis pigmentosa, a neuropathy, a neuroblastoma, a recessive deafness, a juvenile myoclonic epilepsy, an aldolase A deficiency, a sporadic breast cancer and a spinal muscular atrophy (see Table 1). Each disease can now be examined with the ciliary evidence in mind to check whether it could indeed be a ciliopathy.

Table 1.
Human genes associated with known and candidate ciliopathies

Finally, several hundred orphan diseases exist in man, with symptoms that may evoke a ciliary origin (deafness, retinal defects, obesity, polydactyly, kidney polycystosis, mental retardation), but with genetic location on chromosomes only imprecisely determined by linkage with markers. In Cildb, we have built a special section, the ‘Hsapiens OMIM database’, in which human proteins and OMIM entries, even imprecisely localized, can be displayed. For instance, the ciliopathy ‘Senior-Loken syndrome 3 (SLSN3)’ (OMIM 606995) is localized to chromosome 3, region q22, thus encompassing 831 genes. These genes with their attributes can be displayed to help find candidates for this syndrome.

Conclusion—future challenges

The generation of new high throughput ciliary data from Paramecium prompted us to build Cildb, a new database that integrates heterogeneous information from a variety of sources. The versatility of Cildb makes it a valuable knowledgebase, allowing queries from any proteome and using raw data from high throughput ciliary studies and multi-criteria queries. Cildb also permits the identification of ciliopathy genes and can even help identify candidate genes for diseases imprecisely localized on chromosomes. Beyond ciliary data, Cildb contains easy to retrieve information pertinent for general analyses of proteins, comparative genomics and linking to OMIM data concerning human genetic disorders.

Although updating Cildb is computer-time consuming, the procedure is straightforward. The next challenge will be to incorporate additional information necessary for ontology-aware phenotype descriptions (33). This would allow us to add high throughput RNAi studies or genetic screens, such as the ones conducted in the fish Danio reirio (34).

Supplementary data

Supplementary data are available at Database Online.

Funding

CNRS and the Agence Nationale de la Recherche, grant number NT05-2_41522. Funding for open access charge: CNRS.

Conflict of interest statement. None declared.

Acknowledgements

The authors are grateful to Anne Laurençon who spent time testing Cildb and participated in its improvement and to the INRA MIGALE bioinformatics platform for providing computational resources.

References

1. Bornens M. Organelle positioning and cell polarity. Nat. Rev. Mol. Cell Biol. 2008;9:874–886. [PubMed]
2. Dawe HR, Farr H, Gull K. Centriole/basal body morphogenesis and migration during ciliogenesis in animal cells. J. Cell Sci. 2007;120:7–15. [PubMed]
3. Marshall WF. Basal bodies platforms for building cilia. Curr. Top Dev. Biol. 2008;85:1–22. [PubMed]
4. Basu B, Brueckner M. Cilia multifunctional organelles at the center of vertebrate left-right asymmetry. Curr. Top Dev. Biol. 2008;85:151–174. [PubMed]
5. Salathe M. Regulation of mammalian ciliary beating. Annu. Rev. Physiol. 2007;69:401–22. [PubMed]
6. Satir P, Christensen ST. Overview of structure and function of mammalian cilia. Annu. Rev. Physiol. 2007;69:377–400. [PubMed]
7. Sharma N, Berbari NF, Yoder BK. Ciliary dysfunction in developmental abnormalities and diseases. Curr. Top Dev. Biol. 2008;85:371–427. [PubMed]
8. Christensen ST, Pedersen SF, Satir P, et al. The primary cilium coordinates signaling pathways in cell cycle control and migration during development and tissue repair. Curr. Top Dev. Biol. 2008;85:261–301. [PubMed]
9. Kiprilov EN, Awan A, Desprat R, et al. Human embryonic stem cells in culture possess primary cilia with hedgehog signaling machinery. J. Cell Biol. 2008;180:897–904. [PMC free article] [PubMed]
10. Aury J, Jaillon O, Duret L, et al. Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature. 2006;444:171–178. [PubMed]
11. Arnaiz O, Cain S, Cohen J, et al. ParameciumDB: a community resource that integrates the Paramecium tetraurelia genome sequence with genetic data. Nucleic Acids Res. 2007;35:D439–D444. [PubMed]
12. Nogales-Cadenas R, Abascal F, Díez-Pérez J, et al. CentrosomeDB: a human centrosomal proteins database. Nucleic Acids Res. 2009;37:D175–D180. [PMC free article] [PubMed]
13. Inglis PN, Boroevich KA, Leroux MR. Piecing together a ciliome. Trends Genet. 2006;22:491–500. [PubMed]
14. Gherman A, Davis EE, Katsanis N. The ciliary proteome database: an integrated community resource for the genetic and functional dissection of cilia. Nat. Genet. 2006;38:961–962. [PubMed]
15. O'B;rien KP, Remm M, Sonnhammer E.LL. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. 2005;33:D476–D80. [PMC free article] [PubMed]
16. Li JB, Gerdes JM, Haycraft CJ, et al. Comparative genomics identifies a flagellar and basal body proteome that includes the BBS5 human disease gene. Cell. 2004;117:541–552. [PubMed]
17. Reiter LT, Do LH, Fischer MS, et al. Accentuate the negative: proteome comparisons using the negative proteome database. Fly (Austin) 2007;1:164–171. [PubMed]
18. Smith JC, Northey J.GB, Garg J, et al. Robust method for proteome analysis by MS/MS using an entire translated genome: demonstration on the ciliome of Tetrahymena thermophila. J. Proteome Res. 2005;4:909–919. [PubMed]
19. Ostrowski LE, Blackburn K, Radde KM, et al. A proteomic analysis of human cilia: identification of novel components. Mol. Cell Proteomics. 2002;1:451–465. [PubMed]
20. Emanuelsson O, Brunak S, von Heijne G, et al. Locating proteins in the cell using TargetP, SignalP and related tools. Nat. Protoc. 2007;2:953–971. [PubMed]
21. Möller S, Croning MD, Apweiler R. Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics. 2001;17:646–653. [PubMed]
22. Smedley D, Haider S, Ballester B, et al. BioMart—biological queries made easy. BMC Genomics. 2009;10:22. [PMC free article] [PubMed]
23. Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
24. Stein LD, Mungall C, Shu S, et al. The generic genome browser: a building block for a model organism system database. Genome Res. 2002;12:1599–1610. [PubMed]
25. Perl template toolkit. Available at: http://template-toolkit.org/
26. Pazour GJ, Agrin N, Leszyk J, et al. Proteomic analysis of a eukaryotic cilium. J. Cell Biol. 2005;170:103–113. [PMC free article] [PubMed]
27. Adams M, Smith UM, Logan CV, et al. Recent advances in the molecular pathology, cell biology and genetics of ciliopathies. J. Med. Genet. 2008;45:257–267. [PubMed]
28. Gerdes JM, Davis EE, Katsanis N. The vertebrate primary cilium in development, homeostasis, and disease. Cell. 2009;137:32–45. [PMC free article] [PubMed]
29. Duriez B, Duquesnoy P, Escudier E, et al. A common variant in combination with a nonsense mutation in a member of the thioredoxin family causes primary ciliary dyskinesia. Proc. Natl Acad. Sci. USA. 2007;104:3336–3341. [PubMed]
30. Loges NT, Olbrich H, Fenske L, et al. DNAI2 mutations cause primary ciliary dyskinesia with defects in the outer dynein arm. Am. J. Hum. Genet. 2008;83:547–558. [PubMed]
31. Omran H, Kobayashi D, Olbrich H, et al. Ktu/PF13 is required for cytoplasmic pre-assembly of axonemal dyneins. Nature. 2008;456:611–616. [PMC free article] [PubMed]
32. Castleman VH, Romio L, Chodhari R, et al. Mutations in radial spoke head protein genes RSPH9 and RSPH4A cause primary ciliary dyskinesia with central-microtubular-pair abnormalities. Am. J. Hum. Genet. 2009;84:197–209. [PubMed]
33. Washington NL, Haendel MA, Mungall CJ, et al. 2009. Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol., 7, e1000247. [PMC free article] [PubMed]
34. Zhao C, Malicki J. Genetic defects of pronephric cilia in zebrafish. Mech. Dev. 2007;124:605–616. [PubMed]
35. Andersen JS, Wilkinson CJ, Mayor T, et al. Proteomic characterization of the human centrosome by protein correlation profiling. Nature. 2003;426:570–574. [PubMed]
36. Avidor-Reiss T, Maer AM, Koundakjian E, et al. Decoding cilia function: defining specialized genes required for compartmentalized cilia biogenesis. Cell. 2004;117:527–539. [PubMed]
37. Blacque OE, Perens EA, Boroevich KA, et al. Functional genomics of the cilium, a sensory organelle. Curr. Biol. 2005;15:935–941. [PubMed]
38. Broadhead R, Dawe HR, Farr H, et al. Flagellar motility is required for the viability of the bloodstream trypanosome. Nature. 2006;440:224–227. [PubMed]
39. Chen N, Mah A, Blacque OE, et al. Identification of ciliary and ciliopathy genes in Caenorhabditis elegans through comparative genomics. Genome Biol. 2006;7:R126. [PMC free article] [PubMed]
40. Efimenko E, Bubb K, Mak HY, et al. Analysis of xbx genes in C. elegans. Development. 2005;132:1923–1934. [PubMed]
41. Keller LC, Romijn EP, Zamora I, et al. Proteomic analysis of isolated chlamydomonas centrioles reveals orthologs of ciliary-disease genes. Curr. Biol. 2005;15:1090–1098. [PubMed]
42. Kilburn CL, Pearson CG, Romijn EP, et al. New Tetrahymena basal body protein components identify basal body domain structure. J. Cell Biol. 2007;178:905–912. [PMC free article] [PubMed]
43. Laurençon A, Dubruille R, Efimenko E, et al. Identification of novel regulatory factor X (RFX) target genes by comparative genomics in Drosophila species. Genome Biol. 2007;8:R195. [PMC free article] [PubMed]
44. Liu Q, Tan G, Levenkova N, et al. The proteome of the mouse photoreceptor sensory cilium complex. Mol. Cell Proteomics. 2007;6:1299–317. [PMC free article] [PubMed]
45. Mayer U, Ungerer N, Klimmeck D, et al. Proteomic analysis of a membrane preparation from rat olfactory sensory cilia. Chem. Senses. 2008;33:145–162. [PubMed]
46. Stolc V, Samanta MP, Tongprasit W, et al. Genome-wide transcriptional analysis of flagellar regeneration in Chlamydomonas reinhardtii identifies orthologs of ciliary disease genes. Proc. Natl. Acad. Sci. USA. 2005;102:3703–3707. [PubMed]

Articles from Database: The Journal of Biological Databases and Curation are provided here courtesy of Oxford University Press