|Home | About | Journals | Submit | Contact Us | Français|
Gene expression analysis is a widely used and powerful method for investigating the transcriptional behavior of biological systems, for classifying cell states in disease and for many other purposes. Recent studies indicate that common assumptions currently embedded in experimental and analytical practices can lead to misinterpretation of global gene expression data. We discuss these assumptions and describe solutions that should minimize erroneous interpretation of gene expression data from multiple analysis platforms.
Global gene expression analysis provides quantitative information about the population of RNA species in cells and tissues. It is an exceptionally powerful tool of molecular biology that is used to explore basic biology, diagnose disease, facilitate drug discovery and development, tailor therapeutics to specific pathologies and generate databases with information about living processes. Consequently, expression analysis is among the most commonly used methods in modern biology; there are over 750,000 expression datasets in the NCBI Gene Expression Omnibus (GEO) public database (Edgar et al., 2002).
Global gene expression analysis uses DNA microarrays, RNA-Seq, and other methods to measure the levels of RNA species in biological systems (Geiss et al., 2008; Heller, 2002; Lockhart and Winzeler, 2000; Ozsolak and Milos, 2011; Schena et al., 1998; Wang et al., 2009). DNA microarrays, which have been most frequently used for expression analysis, consist of millions of individual oligonucleotide probes fixed to a solid surface. The oligonucleotide probes typically have sequences representative of known RNA species and are generally used to quantitate the relative levels of RNA species that hybridize to the probes. Massively parallel sequencing technologies, developed more recently, provide a measure of the frequency of RNA species through sequencing of RNA-derived cDNA populations. Other approaches, such as digital molecular barcoding, represent a fusion of the hybridization and counting approaches. For instance, the nCounter digital quantification platform relies on hybridization of labeled probes to RNA molecules and single molecule imaging to provide a measurement of the frequency of particular RNA species.
Almost all global expression analysis involves isolation of RNA from two or more cellular sources, introducing similar amounts of RNA from the sources into the experimental platform and analyzing the data using algorithms that normalize the signal from the samples (Kulkarni, 2011; Mortazavi et al., 2008; Quackenbush, 2002; Schulze and Downward, 2001). If the cellular sources produce equivalent amounts of RNA/cell, and the yields of RNA and its derivatives are equivalent throughout experimental manipulation, then normalized expression data should produce an accurate representation of the relative levels of each gene product.
We recently found that cells with high levels of c-Myc can amplify their gene expression program, producing 2–3 times more total RNA and generating cells that are larger than their low-Myc counterparts (Lin et al., 2012; Nie et al., 2012). This discovery has led us to question the common assumption that cells produce similar levels of RNA/cell and the general practice of introducing similar amounts of total RNA into analysis platforms without including standardized controls that would reveal transcriptional amplification or repression. As described below, it is likely that this assumption and practice has led to erroneous interpretations. We describe here an experimental approach to genome-wide analysis of RNA expression that is more likely to produce accurate assessments of changes in steady-state levels of RNA.
Consider two different models for changes in gene expression (Figure 1). In the first, RNA levels for a minority of genes are elevated, but the levels of total RNA in the two cells is similar (Figure 1A). The absolute levels of most RNA species are therefore similar in the two cells, and when the total signal for the RNA population is normalized by standard algorithms, the resulting expression data appropriately indicates an increase in the relative RNA levels for a set of genes (Figure 1B). In the second model, the two cells express a similar set of genes, but one cell produces and accumulates 2–3 times more RNA/gene for many of the same genes expressed in the other cell (Figure 1C), an effect that has been termed transcriptional amplification (Lin et al., 2012; Nie et al., 2012). In the conventional approach to expression analysis, similar amounts of RNA from the two cells are introduced into the assay, thus masking the fact that one of the cells has 2–3 times more RNA than the other (Figure 1D). This potential source of error is typically overlooked because of the commonly believed, though rarely stated, assumption that the absolute amount of total mRNA in each cell is similar across different cell types or experimental perturbations. Furthermore, the most commonly used analysis methods are primarily intended to account for technical variations in signal to noise and assume that the signals for different samples from different experiments should be scaled to have the same median or average value, or that the distributions of signal intensities for each experiment within a set should all be the same (Bolstad et al., 2003; Huber et al., 2002; Irizarry et al., 2003; Kalocsai and Shams, 2001; Li and Wong, 2001; Reimers, 2010; Wu et al., 2004). Normalization of signal from cells that experience transcriptional amplification can thus have the net result of equalizing values that are actually different and producing the erroneous perception that some genes have elevated expression while a similar number of genes have reduced expression.
To produce a reliable gene expression analysis protocol that addresses this experimental and data normalization issue, we investigated the use of spiked-in standards (Benes and Muckenthaler, 2003; Hartemink et al., 2001; Hill et al., 2001; Jiang et al., 2011; Mortazavi et al., 2008). We implemented an approach that uses spiked-in RNA standards to allow normalization to cell number and permit correction for differences in yields during experimental manipulation (Figure 2A). We performed genome-wide analysis on P493-6 cells expressing low or high levels of c-Myc (Pajic et al., 2000; Schuhmacher et al., 1999) where cells with high levels of the transcription factor were found to produce 2-3 -fold higher levels of the same RNA species found in cells with low levels (Lin et al., 2012). Cell number was determined by counting cells using C-Chip disposable hemocytometers (Digital Bio, Hopkinton, MA) and equivalent numbers of high- and low-Myc cells were harvested. The DNA content of the two samples was measured and found to be equivalent. Following total RNA extraction, spiked-in RNA standards were added in proportion to the number of cells present in the sample. Samples were then split and prepared for microarray, RNA-seq and digital analysis using NanoString.
DNA-microarrays were first used to compare the high-Myc versus low-Myc cell RNA populations (Figure 2B; Table S1). Similar amounts of RNA from the low- and high-Myc cells were introduced into the Affymetrix DNA microarray assay following the manufacturer’s protocol, which is the most common approach used in expression analysis. The resulting data were processed by using standard normalization methods and by using the spike-in standards for normalization. The results obtained using standard approaches can be interpreted to mean that the expression levels of some genes is unchanged, while others increase or decrease (Figure 2B). The interpretation is quite different when the same data is normalized using spike-in standards that reflect cell number: over 90% of the genes show increases in expression in high-Myc cells relative to low-Myc cells (Figure 2B).
RNA-Seq analysis was then used to compare the high-Myc versus low-Myc cell RNA populations (Figure 2C; Table S2). Similar amounts of RNA from the low- and high-Myc cells were subjected to sequencing. The resulting data were processed by using standard normalization methods and by using the spike-in standards for normalization. Again, the results obtained using standard approaches suggest that the expression levels of some genes is unchanged, while others increase or decrease (Figure 2C), yet when the same data is normalized using spike-in standards that reflect cell number, there is an increase in transcript levels for the vast majority of genes (Figure 2C).
We then used whole-sample, digital gene expression quantification (NanoString, Seattle, WA) to compare transcript levels in the high-Myc and low-Myc cells. In one experiment, equal amounts of RNA from the high- and low-Myc cells were compared using this method. The results of this analysis suggest that the expression levels of some genes is unchanged, while others increase or decrease. In a second experiment, equal numbers of high- and low-Myc cells were used to prepare RNA and these total RNA populations were subjected to digital gene expression quantification. Here the data indicate there is an increase in transcript levels for the vast majority of genes in high- vs. low-Myc cells (Figure 2D; Table S3).
In summary, three of the major technologies typically used for global gene expression analysis - microarray, RNA-sequencing and digital quantification - detect a widespread increase in transcripts/cell in cells that experience transcriptional amplification by c-Myc. Significantly, all three of these major technologies used for gene expression fail to detect the widespread increase of transcription when inappropriate normalization methods are used. Instead, they erroneously suggest the interpretation that a similar number of genes show increases and decreases in expression.
Our results indicate that spike-in controls of the type described here are a robust, cross-platform method to allow normalization to cell number and thus enable more accurate detection of differential gene expression and changes in gene expression programs. The clear implication is that the use of spike-in controls normalized to cell number should become the default standard for all expression experiments, as opposed to their more limited use in experiments where gross changes in RNA levels are already anticipated, as exemplified by transcription shutdown experiments (Bar-Joseph et al., 2012). When cell counting may be problematic, as for expression experiments from solid tumors or tissues, DNA content may be used as a surrogate if ploidy and DNA replication profiles are also characterized to prevent the introduction of a DNA content-based artifact.
The discovery of transcriptional amplification and the realization that common experimental methods may lead to erroneous interpretation of gene expression experiments has implications for much current biological research. How prevalent is misinterpretation of genome-wide expression data due to the assumption that cells produce similar levels of total RNA? The answer is likely related to the prevalence of regulatory mechanisms that globally amplify or suppress transcription. What are the implications for classifying cell states in disease? Significant effort is being devoted to expression profiling cancer cells and these studies use standard normalization methods (Alizadeh et al., 2000; Beer et al., 2002; Berger et al., 2010; Bhattacharjee et al., 2001; Bittner et al., 2000; Golub et al., 1999; Lapointe et al., 2004; Northcott et al., 2012; Ramaswamy et al., 2001; Ross et al., 2000; Schmitz et al., 2012; Shipp et al., 2002; Su et al., 2001; The Cancer Genome Atlas TCGA, 2012; van‘t Veer et al., 2002; van de Vijver et al., 2002; Yeoh et al., 2002). Because c-Myc expression occurs at widely varying levels in various tumor cells, transcriptional amplification is likely having a profound impact on cancer cell signatures. Where expression data is being used to gain insights into cancer cell behavior and regulation, it should be interpreted with added caution.
P493-6 cells were kindly provided by Chi Van Dang, University of Pennsylvania. Cells were propagated in RPMI-1640 supplemented with 10% fetal bovine serum and 1% GlutaMAX (Invitrogen, 35050-061). The conditional pmyc-tet construct in P493-6 cells was repressed with 0.1μg/mL tetracycline (Sigma, T7660) for 72 hours. Cells were then washed three times with RPMI-1640 medium containing 10% tetracycline system approved FBS (Clontech, 631105) and 1% GlutaMAX and re-cultured in tetracycline-free culture conditions. All experiments were performed in the absence of EBNA2 activation. Cell numbers were determined by manually counting cells using C-Chip disposable hemocytometers (Digital Bio, DHC-N01) prior to lysis and RNA extraction.
Ten million P493-6 cells were homogenized in 1 mL of TRIzol Reagent (Life Technologies, 15596-026), purified using the mirVANA miRNA isolation kit (Ambion, AM1560) following the manufacturer’s instructions and re-suspended in 100μL nuclease-free water (Ambion, AM9938). Total RNA was spiked-in with ERCC ExFold RNA spike-in controls, treated with DNA-freeTM DNase I (Ambion, AM1906) and analyzed on Agilent 2100 Bioanalyzer for integrity. The external control spike-ins used in the microarray and RNA-Seq analysis were obtained from the ERCC ExFold RNA Spike-In kit (Ambion, 4456739). The ERCC RNA Spike-In Control Mixes used here comprise a set of 92 polyadenylated transcripts that mimic natural eukaryotic mRNAs. The RNAs range in size from 250 – 2000 nucleotides in length and span an approximately 106-fold concentration range.
After extracting total RNA from equal numbers of cells, a fixed amount of diluted ERCC Spike-In Mix #1 was added. The amount of spike-in added was calibrated to the RNA yield of the high-Myc cells to ensure the spike-in signal was in the appropriate dynamic range (ERCC User Guide, Table 4). For these experiments, 1μl of a 1:10 dilution of Mix #1 was added to total RNA extracted from 1 × 106 cells. RNA with the RNA Integrity Number (RIN) above 9.8 was used for library generation for RNA-Seq or hybridized to GeneChip® PrimeView Human Gene Expression Arrays (Agilent), using 10μg or 100ng of total RNA, respectively.
Spike-in controls can also be added prior to RNA purification, if desired, but ideally, the controls used in that case should have additional features of mRNAs, such as 5′ caps, that would limit the potential for degradation during lysis.
For microarray analysis, 100ng of total RNA containing ERCC ExFold Mix #1 RNA spike-in controls (see above) was used to prepare biotinylated aRNA (cRNA) according to the manufacturer’s protocol (3′ IVT Express Kit, Affymetrix 901228). GeneChip arrays (Primeview, Affymetrix 901837) were hybridized and scanned according to standard Affymetrix protocols. All samples were processed in technical duplicate. Images were extracted with Affymetrix GeneChip Command Console (AGCC), and analyzed using GeneChip Expression Console. A Primeview CDF that included probe information for the ERCC controls, provided by Affymetrix, was used to generate. CEL files. We processed the CEL files using standard tools available within the affy package in R. The CEL files were processed with the expresso command to convert the raw probe intensities to probeset expression values. The parameters of the expresso command were set to generate Affymetrix MAS5-normalized probeset values. We used a loess regression to re-normalize these MAS5 normalized probeset values, using only the spike-in probesets to fit the loess. The affy package provides a function, loess.normalize, which will perform loess regression on a matrix of values (defined using the parameter mat) and allows for the user to specify which subset of data to use when fitting the loess (defined using the parameter subset, see the affy package documentation for further details). For this application the parameters mat and subset were set as the MAS5-normalized values and the row-indices of the ERCC control probesets, respectively. The default settings for all other parameters were used. The result of this was a matrix of expression values normalized to the control ERCC probes. The probeset values from the duplicates were averaged together and the log2 fold change from the low-Myc to the high-Myc samples are shown.
Using 10μg of total RNA containing ERCC ExFold Mix #1 RNA spike-in controls (see above), sequencing libraries were prepared according to the following protocol. Polyadenylated RNA was purified by two rounds of selection using Dynabeads® mRNA Purification Kit for mRNA Purification from total RNA (Life Technologies, 610-06) following the manufacturer instructions. This resulting RNA was then further processed for RNA-Seq assays. Briefly, polyadenylated RNA was fragmented with divalent cations under elevated temperature. First strand cDNA synthesis was performed with random hexamers and Superscript III reverse transcriptase (Life Technologies, 18080-051). Second strand cDNA synthesis was performed using RNAse H and DNA Polymerase I. In the second-strand synthesis reaction, dTTP was replaced with dUTP. Following cDNA synthesis, the double stranded products were end repaired, a single “A” base was added, and Illumina PE adaptors were ligated onto the cDNA products. The ligation products with an average size of 300 bp were purified using agarose gel electrophoresis. Following gel purification, the strand of cDNA containing dUTP was selectively destroyed during incubation of purified double-stranded DNA with HK-UNG (Epicentre, HU59100). The adapter ligated single-stranded cDNA was then amplified with 15 cycles of PCR and PCR products were purified using gel electrophoresis. These RNA-Seq libraries were subsequently sequenced on Illumina HiSeq 2000. Sequences were aligned using Bowtie (version 0.12.2) to build version NCBI36/HG18 of the human genome where the sequences of the ERCC synthetic spike-in RNAs (http://tools.invitrogen.com/downloads/ERCC92.fa) had been added. The RPKM (Reads Per Kilobase of exon per Million) was then computed for each gene and synthetic spike-in RNA. We used a loess regression to re-normalize the RPKM values, using only the spike-in values to fit the loess. The affy package in R provides a function, loess.normalize, which will perform loess regression on a matrix of values (defined using the parameter mat) and allows for the user to specify which subset of data to use when fitting the loess (defined using the parameter subset, see the affy package documentation for further details). For this application the parameters mat and subset were set as a matrix of all RPKM values and the row-indices of the ERCC spike-ins, respectively. The default settings for all other parameters were used. The result of this was a matrix of RPKM values normalized to the control ERCC spike-ins. The log2 fold change from the low-Myc to the high-Myc samples were then calculated and shown as a heatmap.
For digital gene expression using NanoString nCounter Gene Expression CodeSets, 1 × 106 cells were collected and lysed directly either in 100μL RLT buffer (Qiagen, 74104) to yield a concentration of 10,000 cells per μL or in 500μL Lysis Buffer using the mirVANA miRNA isolation kit (Ambion, AM1560). Samples were processed according to the Cell Lysate Protocol (nCounter Gene Expression Protocol, NanoString) or the Total RNA Extraction Protocol (Ambion). Four μL of cell lysate (for Cell-Count normalization) or 100ng of total RNA (for Total RNA normalization) was subsequently incubated overnight at 65°C in nCounter Reporter CodeSet, Capture ProbeSet and hybridization buffer. Following hybridization, samples were immediately processed with the nCounter PrepStation and subsequently analyzed on an nCounter Digital Analyzer. All samples were processed in biological duplicate.
We used two custom nCounter Reporter CodeSets encompassing 429 genes. These codsets encompassed sets of known cancer related genes (CodeSet CS-1, CS-2) (Delmore et al., 2011). For each NanoString dataset, we used a piecewise linear interpolation of control RNAs (added after hybridization as part of the nCounter PrepStation protocol) to their known concentrations to normalize each dataset. The normalized signals for the biological duplicates were averaged together. 266 genes showing active expression (a normalized value of 1.0 or greater) in both the average low-Myc Total-RNA, and average low-Myc Cell-Count samples were selected, and the log2 fold ratio between the low-Myc and high-Myc samples were calculated and shown as a heatmap.
We thank Tom Volkert, Jeong-Ah Kwen, Jennifer Love, Sumeet Gupta, at the Whitehead Genome Technologies Core for Solexa sequencing and microarray processing and Ziv Bar-Joseph for critical comments. This work was supported by National Institutes of Health grants HG002668 (RAY) and CA146445 (RAY, TL), an American Cancer Society Postdoctoral Fellowship PF-11-042-01-DMC (PBR) and a Swedish Research Council Postdoctoral Fellowship VR-B0086301 (JL).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.