LIFEdb was initially developed to publish data on full-length cDNAs and the subcellular localization of the encoded proteins (
4). During the past two years the content of the database has constantly grown to currently contain data on 1500 cDNAs and localizations and microscopic images of some 1000 proteins. We have now integrated a first dataset from a cell-based screening assay that addresses the influence of protein-overexpression on cell proliferation (
5). This screen comprised initially 103 proteins and is the first posting of such high-throughput data in an open-access database (). Expression constructs encoding proteins of interest and fused to green fluorescent protein derivates at either their N- or C-terminus were transfected into mammalian cells, and effects of protein-overexpression on G1/S-phase transition were measured. This was done using a high-content screening microscope by monitoring the incorporation of BrdU through immunofluorescent staining. The data were statistically analysed using a linear model correcting for systematic and random errors. This resulted in a
Z-score, based on a smoothed local regression function for each single experiment. Proteins with positive values of
Z are considered to be an activator and those having a negative value to be a repressor of cell proliferation. The results for each investigated protein were calculated as the median value of the
Z-scores of all replicate experiments carried out with the respective ORF. To obtain a measure of the significance (
P-value), the set of
Z-scores of one protein was compared with the overall distribution of
Z-scores for all proteins via the two-sided Wilcoxon test. Results from the cellular screen can be searched for with a suitable search field, where users can specify if activators, repressors or both are to be displayed and where they are able to define a cut-off for the minimal accepted
P-value. Results are displayed as an extra column showing the median
Z-score and the accompanying
P-value. The distribution of the
Z-scores for each ORF can be viewed as a histogram in an extra window (see ) that is accessible via a hyperlink. There, the data on N-terminal fusion constructs (CFP–ORF) are displayed in dark blue and values from C-terminal fusion constructs (ORF–YFP) are displayed in green. The numbers of proteins with attached information from functional profiling will continuously increase as more proteins are screened.
In addition to these experimental results, we included data on the relative tissue expression of the genes under investigation (‘Electronic Northern’, ). The calculation is based on the number of ESTs for every gene that were sequenced mostly in large scale projects (
6–
10). We used the UniGene (
11) EST-dataset and eVOC ontologies (
12) which curate this dataset in a detailed manner, to obtain a controlled tissue vocabulary. dbEST library mappings to the ontologies were obtained from the eVOC website (
http://www.evocontology.org). The first level terms of the ontology ‘Anatomical System’ were used for the tissue-definitions (for a list, see
http://www.inet.dkfz-heidelberg.de/LIFEdb/ENorthernLegend.htm). All EST-libraries assigned to the respective term (or sub-term) were pooled. cDNAs were mapped to UniGene cluster IDs via the GenBank accession number in the UniGene dataset.
The relative gene expression of one transcript was calculated using the number of ESTs in the respective UniGene cluster belonging to each ontology term which was then normalized for each term (for details on the calculation see
http://www.dkfz.de/mga/groups.asp?siteID=160).
The datasets, mappings and calculations are updated when new versions of the respective datasets become available.
The expression for each gene is shown for the terms of the anatomical system as colored boxes in the table output. Boxes are labeled with an abbreviation of the underlying definition. Relative gene expression values are indicated by different colors. Values <1 (relative ‘under-expression’) are displayed in blue and values >1 are in red (relative ‘overexpression’). Darker colors represent a higher degree of under- or overexpression. Boxes in white indicate that no UniGene expression of the respective gene was identified in that particular group of tissues. Information on the underlying numbers (ESTs in the respective cluster and tissues) is displayed upon moving the mouse over the boxes. This information is included in the XML output.