Compiling the transcriptome of a cell or tissue is arguably more demanding than establishing the number of gene loci encoded by a given genome sequence [43
]. This may mainly be explained by the dynamic nature of mRNA itself which frequently produces alternative transcripts from a single gene locus by usage of tissue-specific promoters, cryptic splice sites or variable polyadenylation signals [44
]. In addition, variation in gene expression is known to occur within and between populations [46
] and allele-specific expression, even from non-imprinted genes, appears to be common [48
]. Further complicating transcriptome definition are effects of gender and age on RNA expression [49
] as well as agonal and postmortem factors which greatly affect RNA integrity and thus frequently influence subsequent analyses [50
]. Finally, differences in experimental technologies and data post-processing add an additional level of variability. Taken together, the complexities in mRNA metabolism and experimental data handling strongly suggest that there is not a single transcriptome for a given cell or tissue but implies an arbitrary number of individual transcriptomes which need to be defined by a series of parameters such as age, gender, ethnicity, cause and time of death of the tissue donor besides many others. It is therefore advisable to initially aim for a reference transcriptome providing a blueprint of an expression profile within a broadly defined time-frame. Following this line of reasoning, we here present a framework of a first reference transcriptome of the retina/RPE consisting of 13,037 unique transcripts which broadly characterize the mature state of expression in this tissue.
The present meta-analysis has integrated information from 27 studies employing diverse technologies to identify retinal/RPE transcrips. Among these, SAGE represents a sensitive tool to detect low level transcription [51
] while the PCR-based SSH method is well suited to enrich for differentially expressed genes [36
]. The combined use of these approaches together with conventional cDNA library sequencing and microarray-based techniques provides a more solid assessment of gene expression than would each method alone. For example, SAGE is based on sequencing of hundreds of thousands of short (10, 14, or 21 bp) tags, ideally derived from a unique location of a single transcript. Rare tags could originate from infrequently expressed transcripts but could also reflect minor genomic contamination or minor sequencing errors. For the assembly of the reference retinome we have addressed these concerns by including only those transcripts that have independently been confirmed in a second unrelated study. This has led to a conservative assembly of the 13K retinome. It should be kept in mind however that this proceeding likely excludes a number of authentic transcripts. This is illustrated by the finding that the 15K retinome which comprises 15,645 transcripts including those which were solely found in a single study (Table ), contains an additional five of the 102 known retinal disease genes (RHOK
) not included in the 13K retinome. Similarly, an additional three genes (RHOK
) involved in the vitamin A/phototransduction pathway are part of the 15K but not the 13K retinome. With additional transcription data on the retina/RPE becoming available, a second generation retinome map will need to address this issue.
The estimation of transcriptome size represents one of the fundamental questions in molecular biology. Early studies using reassociation kinetics have calculated the number of distinct mRNA transcripts present in various mouse tissues to be between 11,500 and 12,500 [52
]. Initial SAGE analyses have led to the conclusion that the number of different transcripts observed in normal and tumorous tissue may lie between 14,247 and 20,471 [53
]. Recent data from comprehensive EST sequencing of a number of tissues including brain, breast, colon, head/neck, kidney lung, ovary, prostate, and uterus suggest expression of between 7,500 and 13,500 distinct genes for each tissue [54
]. Although the size of the reference retinome is consistent with these estimates, the question of adequate transcript representation by the current compilation remains open. We have addressed this by defining a number of gene groups with known expression in retina/RPE and comparing these to the reference retinome. Genes exclusively expressed in retina/RPE are highly represented in the retinome (100%), as are mainly tissue-specific genes known to play a role in the vitamin A/phototransduction pathway (93%) (Table ). A partial list of 260 genes whose encoded proteins were shown by immunohistochemistry to be expressed in the retina/RPE (but may also be present in other tissues), were represented in the reference retinome at a rate of approximately 79%. Similar numbers were obtained for the retinome coverage of retinal disease genes (85%). From these data we conclude that the 13K reference retinome is highly representative of retina/RPE-expressed genes and may describe as much as 90% of the transcript complement in the adult state.
Another point of interest concerns the proportion of retinome transcripts which is uniquely expressed in this tissue. Brentani et al. [54
] estimate that any two tissues may share between 73% and 84% of their transcriptomes. Comparing transcription in three tissues (breast, colon, head/neck) the authors found overlapping expression in 47% of transcripts. To investigate this in more detail, we have compiled three partial transcriptomes from heart (n = 3,660), liver (n = 5,780) and prostate (n = 7,018) by applying the same stringent criteria as defined for the retinome. Limited by the size of the partial heart transcriptome, we determined 2,330 transcripts (termed "housekeeping" genes) to be expressed in all four tissues (i.e. 64% of the heart transcriptome). Comparing the retinome to any
of the partial transcriptomes revealed overlapping gene profiles between 92 % and 95 %. This would suggest that only a minor proportion of retinome transcripts is indeed unique to the retina/RPE. Thus far, we have identified a group of so called "retinome-enriched" genes comprising 5,051 transcripts which are not present in the partial transcriptomes of heart, liver and prostate. This group most likely contains additional "housekeeping" or tissue-restricted transcripts and needs further adjustment by more refined in-silico
normalization to comprehensive reference transcriptomes of other tissues.
Highly expressed genes including those with a ubiquitous or a tissue-specific transcription profile, have been shown to cluster in chromosomal regions of increased gene expression (termed RIDGEs) [55
]. Functionally, this higher order structure has been related to transcriptional regulation [56
]. To search for a possible correlation, we have determined the chromosomal distribution of the reference retinome independent of gene density. Our data show good agreement with the previously established regional expression map defining approximately 30 RIDGEs within the human genome. Overlaps are most evident for chromosomes 6, 9, 11, 17, and 19. From this we conclude that the majority of transcripts assembled in the reference retinome share characteristics of the RIDGEs including moderate to high level expression. This finding may be ascribed to the stringent selection criteria we have applied to assemble the reference retinome by excluding all transcripts (n = 2,608) that were reported in only a single study. Conversely, the RIDGE-like pattern of the reference retinome could be an indication that missing transcripts may have features compatible with chromosomal domains defined as anti-RIDGEs [56
]. As opposed to RIDGEs, clustering of genes in anti-RIDGEs seems associated with significant decreased expression [56
]. In contrast to their fractional occurrence in transcriptomes, the identification of such low abundant transcripts are likely to require significant resources in order to compile more complete transcriptomes.
To provide positional candidates for retinal disease genes, we have mapped the transcripts representing the reference retinome to the minimal regions defined for 42 retinal disease loci with as yet undefined gene mutations. To further limit the number of candidate genes, in particular for loosely defined disease loci such as RP28
, we have similarly integrated the "retinome-enriched" transcripts. This also accommodates for the fact that approximately 50% of retinal disease genes are retina/RPE-specific [58
]. For 41 of 42 unknown disease genes we have now identified strong candidates although for some disease loci including AIED
, and USH2B
, the number of candidates may still exceed capacities of most laboratories for direct analysis. For other disease loci (e.g. BCD
), a restricted number of candidates are now available (see additional File 14