The human body displays a complex array of biological functions mediated by the expression of mRNA and proteins. The construction of complex organs, such as the kidney or the brain, is far from understood and there is a need to dissect in a systematic manner the expression of genes and proteins using quantitative methods. The complete sequences of human genomes (
Lander et al, 2001;
Venter et al, 2001) have facilitated such studies and opened up the possibility for whole-genome analysis on both the RNA and the protein levels. The ultimate goal of such an endeavor is to define the quantitative levels of the transcriptome and the proteome in various cell types in human tissues and organs.
We have recently described an antibody-based immunohistochemistry analysis of 48 human organs and tissues (
Ponten et al, 2009) based on proteins corresponding to one third of all human genes, showing that a large portion of the analyzed proteins were detected across the tissues in a ubiquitous manner. This led to the suggestion that tissue specificity is achieved by precise regulation of protein levels in space and time, and that different tissues in the body acquire their unique characteristics by controlling not which proteins are expressed but how much of each is produced (
Ponten et al, 2009). Similarly, a detailed study of 1% of the human genome showed that chromosomes are ubiquitously transcribed and that the majority of all bases are included in primary transcripts (
Birney et al, 2007). These results have recently been supported by deep sequencing, demonstrating that a majority of the transcripts can be detected in a human cell line (
Sultan et al, 2008) and that a large fraction (75%) of the human protein-coding genes are expressed in most tissues (
Ramskold et al, 2009).
Most analyses of whole proteomes by mass spectrometry have so far been performed on yeast cells (
Ghaemmaghami et al, 2003;
de Godoy et al, 2008;
Picotti et al, 2009). Analysis of the human proteome using mass spectrometry has so far only been performed on a moderate fraction of the complete proteome. The proteomic limitation combined with the limitation of quantification accuracy in array-based methods for RNA analysis have resulted in relatively low correlations between RNA and protein levels (
de Sousa Abreu et al, 2009), as exemplified by studies on yeast (
Griffin et al, 2002;
Greenbaum et al, 2003) and human cancers (
Chen et al, 2002). Recent technological advances in the field of mass spectrometry make it possible to perform deep proteome analysis also of complex organisms (
Cox and Mann, 2007) with quantitative mass spectrometry methods such as the stable isotope-based SILAC method (
Ong et al, 2002;
Mann, 2006). The technological developments of RNA-seq together with accurate SILAC quantification enable a global comparative analysis of RNA and protein levels and changes in higher eukaryotes such as humans.
A complication in global comparisons of RNA and protein levels in tissues and organs is the multitude of cell types and developmental stages present in most tissues. We have therefore decided to compare the human transcriptome and the proteome with quantitative methods in three established human cell lines of different functional origins allowing an analysis of relatively homogenous cellular populations. Although caution needs to be taken due to the artificial nature of cell lines grown
in vitro, the aim of the study was to categorize all the protein-coding genes based on their cell specificity and expression levels. The analysis was performed using deep sequencing of mRNA, proteomics analysis using triple isotope labeled SILAC mass spectrometry and antibody-based confocal microscopy (
Barbe et al, 2008;
Berglund et al, 2008). The RNA-seq data give the absolute number of reads per kilobase for each gene. The triple-SILAC MS data accurately determine the relative abundance of each of the proteins in the three cell lines. Furthermore, the summed peptide intensities roughly estimate the absolute amount of each of the identified proteins. Since the cells were harvested during exponential growth, steady-state levels of RNA and proteins could be analyzed and compared. The study further allowed quantitative analysis of changes on the RNA and protein levels between the three cell lines, and allowed us to annotate the expression of all genes across these functionally different cells.