Within-slide normalization: intensity- and spatially-dependent systematic error
We first address within-slide normalization, i.e. normalization issues associated with data obtained from a single slide. A well-known source of error can be attributed to biases linked to the different dyes used at the labeling step. Current methods of global normalization assume a uniform grading of systematic error across all variables in an experiment. Two major assumptions are usually made: (i) all cDNA species within a sample will incorporate an equivalent amount of dye per mole cDNA; (ii) there are no other variables (e.g. spatial location, overall intensity, plate) that contribute to dye biases across the slide. These assumptions are too simplistic to account for the multiple sources of systematic error typically encountered in microarray experiments. The problem is best illustrated in an experiment where identical mRNA samples are labeled with Cy3 and Cy5 and subsequently hybridized to the same slide [self–self comparison; described in Dudoit
et al. (
10)]. In a ‘perfect’ self–self hybridization the intensity log ratios
M in an MA-plot should be evenly distributed around zero across all intensity values
A. However, this is rarely the case, and systematic error often manifests itself in terms of non-zero log ratios
M. Furthermore, the imbalance in the red and green intensities is usually not constant across the spots and can vary according to overall spot intensity
A (indicated by a curvature in the MA-plot), location on the array, plate origin and possibly other variables.
Intensity-dependent dye bias can be seen in the apo AI experiment (
14). Apo AI is a gene known to play a pivotal role in high-density lipoprotein metabolism. The goal of the experiment was to identify genes with altered expression in the livers of mice with the apo AI gene knocked out compared with inbred C57Bl/6 control mice. In this instance, it was found that the vast majority of genes examined on the microarray showed no difference in expression level. The clear curvature in the MA-plot in Figure A strongly suggests the existence of an intensity-dependent dye bias.
Some systematic differences may exist between the print tips, such as slight differences in the length or in the opening of the tips, and deformation after many hours of printing. We therefore also performed individual lowess fits within each print tip group. The arrays in the apo AI experiment were printed with a 4 × 4 print head, so each lowess fit in Figure corresponds to spots printed with a single print tip. Four within-print tip group lowess curves stand out from the remaining 12 curves, indicating strong print tip or spatial effects. These four curves correspond to the last row of print tips in the 4 × 4 print head (print tips 13–16). This pattern was visible in the raw images, where the bottom four grids tended to have a higher red signal. We further examined the spatial effects by considering box plots of the log ratios M for each print tip group. Figure shows that print tip groups 13–16 have a larger spread in their log ratios than any of the other 12 print tip groups. Such a difference in spread may result in misidentification of genes that are differentially expressed in the knockout mice compared to the control mice. Thus, normalization for scale across print tip groups seems desirable here.
Within-slide normalization using the majority of genes on the microarray
For the apo AI experiment considered in Figures and , global normalization, in which a constant adjustment is used to force the distribution of the log ratios to have a median zero within each slide, would result in a vertical translation of the MA-plot. It would not correct for intensity- or spatially-dependent effects, including local differences in the spread of the log ratios M. As a first pass towards eliminating intensity and spatial biases, we considered a normalization procedure in which the majority of genes on the array are used for normalization. This is a reasonable assumption when there are good reasons to expect that (i) only a relatively small proportion of the genes will vary significantly in expression between the two co-hybridized mRNA samples or (ii) there is symmetry in the expression levels of the up/down-regulated genes. The data shown in Figures and are good examples of this situation.
To address both intensity and spatial normalization issues, we first incorporated an intensity modifier into our normalization procedures. We used the scatter plot smoother lowess to produce robust location estimates of the intensity log ratios M for various intensity levels A and to adjust each gene with a different normalization value depending on its overall intensity. Other variables that may contribute to systematic bias include differences in print tips and spatial location. Because every grid in an array is printed using the same print tip, print tip groups can also be used as proxies for spatial effects on the slide. Thus, we also incorporated a print tip modifier into the intensity-dependent normalization. It might be thought that the layout of genes on the slide could lead to one or more print tip groups being enriched for differentially expressed genes and, hence, invalidate the assumption underlying print tip group normalization. While we cannot rule out chance imbalances in the spatial distribution of differentially expressed genes, the mechanics of spotting cDNA onto the slide makes a large effect of this kind unlikely. Even if one had a collection of genes known or expected to be differentially expressed in one or more microtiter plates, they would be spotted evenly across the slide by the printer.
In principle, after within-print tip group location normalization, the log ratios from the different print tip groups should be centered around zero (Fig. B). However, it is possible that the log ratios from the various print tip groups have different spreads; if this is the case, a scale adjustment may be required. Figure displays box plots of the intensity log ratios M for a slide in experiment A, before normalization (Fig. A), after within-print tip group location normalization (Fig. B) and after within-print tip group location and scale normalization (Fig. C). In Figure B there is a disproportionately large number of extreme log ratios in the lower four grids. After scale normalization, the extreme log ratios are evenly distributed on the array (Fig. C). Again, this procedure assumes that a relatively small proportion of the genes vary significantly in expression between the two co-hybridized mRNA samples, as would be expected when comparing samples from wild-type mice versus mice harboring a mutation in a single gene. In addition, it is assumed that the spread of the distribution of the log ratios should be roughly the same for all print tip groups. The robust statistic MAD, like the robust lowess smoother, will not be affected by a small percentage of differentially expressed genes, which will appear as outliers in the MA-plots.
In another example of within-slide location normalization, Figure shows an MA-plot from experiment B, for a comparison of mRNA levels in the anterior and posterior portions of the mouse olfactory bulb. These mRNA samples are biologically very similar and very few genes are expected to be differentially expressed. Indeed, the MA-plot shows very little divergence from the lowess fit to all genes and the scatter plot is roughly symmetrical about the lowess curve. We thus performed a print tip group normalization using all genes. Figure displays the intensity data before (Fig. , left) and after normalization (Fig. , right). The normalization procedure resulted in a scatter plot centered around an M value of zero across the A intensity range, thus indicating that the types of systematic errors we have identified have been minimized.
Within-slide normalization using MSP
Frequently, the expression profiles in biological samples are more divergent in nature than in the examples investigated above. Thus, normalization based upon all genes may be inaccurate. A control sample that spans the intensity range and exhibits a relatively constant expression level across biological samples is desirable. Yeast genomic DNA has been used for normalization in that system. Since all species within an mRNA sample can hybridize to this control, sample-specific bias is reduced. The genomic DNA approach does not, however, directly extend to more complex metazoan systems, where the high ratio of non-coding to coding DNA effectively reduces the signal from such a control below the detection threshold in a microarray experiment.
We therefore constructed a novel control sample ensemble, MSP, inclusive of all genes present on the microarray. This sample should be analogous to genomic DNA without the intervening sequences and, thus, provides a potential probe for every species within a labeled cDNA target. We titrated this sample over the intensity range of a typical microarray experiment in order to account for all levels of intensity-dependent bias. The utility of this control is demonstrated in Figure , which highlights the MSP titration series (cyan dots) and the corresponding lowess fit to the MSP spots (cyan curve). Notice that the MSP curve is the same as the lowess fit to the MA-plot based on all genes (red curve). An intensity-dependent normalization using the MSP control as a reference would thus be similar, in this case, to that using all genes.
In experiment B we made a more divergent comparison between mRNA samples from the medial and lateral portions of the olfactory bulb. Due to the presence of vascular tissue near the medial bulb, medial samples have a higher representation of blood tissue. Figure A displays the MA-plot for the medial versus lateral comparison. The genetic divergence between the samples is evident in the increased spread of the log ratios, particularly in the high intensity range. The lowess curve based on all genes (red) and the lowess curve based on the MSP titration series (cyan) are different at high intensity values. In such a case, where samples are widely divergent at high intensities, normalization based on the MSP titration series appears to be more accurate. However, whereas the MSP spots produce more accurate estimates of expression levels, these estimates may be less stable in the context of spatial normalization, due to the small number of MSP spots per print tip group.
Comparisons of MSP to other control samples
In some instances a small number of known genes, for example housekeeping genes whose expression is expected to be constant across samples, are utilized for microarray normalization. Such genes are often highly expressed, as illustrated in Figure for tubulin and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) (yellow). Typically, housekeeping genes are not representative of all intensity values A and are therefore limited in their utility for intensity-dependent normalization. In addition, there is a sample-specific bias for many genes which may not be predictable; this is again non-ideal for use as a control.
Another approach is to select a rank-invariant set of genes. A set of genes is said to be rank-invariant if their ranks are the same for the red and green intensities. In practice, a maximal invariant set tends to be too small and an iterative procedure for finding an approximately invariant set of genes has been proposed (
19,
20). These genes are highlighted in orange in Figure and were obtained using the method described in Tseng
et al. (
20) with
P = 0.01 and
l = 25. The value
P is chosen such that a conserved set of genes is selected. Notice that this set of spots overlaps the lowess fit to the MA-plot based on all genes.
Composite within-slide normalization
We propose a composite normalization method to address the limitations of using all genes or only the MSP titration series for normalization. The composite normalization curve is a weighted average of the MSP curve and the within-print tip group lowess curve based on all genes. The weights are dependent on the cumulative number of genes at different intensity levels. Figure B shows the MA-plot after within-print tip group normalization. In this figure the green composite normalization curve is a weighted average of the red and cyan colored curves. Note that the divergence of the red from the green curve at high intensity values still persists after normalization. In practice, we find composite normalization necessary in the case of divergent samples. Biologically significant outliers in experiment B were more consistently identified when composite normalization was incorporated in the analysis (data not shown).
Comparison between different normalization methods
In order to compare the different within-slide normalization methods, we considered their effect on the location and scale of the log ratios M. Figure A shows density plots of the log ratios for different normalization methods. Without normalization (black curve) the log ratios are centered around –1, indicating a bias towards the green (Cy3) dye. A global median normalization (red curve) shifts the center of the log ratio distribution to zero, but does not affect the spread. The dependence of the log ratio M on the overall intensity A is also still present (see Fig. ). Both the intensity-dependent (green curve) and within-print tip group (blue curve) location normalization methods reduce the spread of the log ratios compared to a global normalization. A within-print tip group scale normalization (cyan curve) further reduces the spread slightly.
The different methods were also evaluated based on their ability to identify genes which are known to be differentially expressed. For experiment A the apo AI gene is knocked out in the eight treatment mice, so one expects the t-statistics to take on very large negative values for this gene. Figure B shows a truncated plot of the extreme t-statistics for each of the methods. The global median, intensity-dependent and within-print tip group location normalization methods seem to perform best in terms of their ability to detect the three copies of the knocked out apo AI gene. A good method should enable a clear distinction between differentially and constantly expressed genes as reflected by the t-statistic, i.e. one expects a large jump in the t-statistic between the least extreme of the differentially expressed genes and the most extreme of the remaining genes. The largest jump in P-values is observed for within-print tip group location normalization. Thus, in the situation presented by experiment A, where log ratios from the different arrays have fairly similar spreads (see Fig. ), within-print tip group location normalization enables the best separation between differentially expressed genes and noise.
Multiple slide normalization
Having addressed location and scale normalization issues within a slide, all normalized log ratios should be centered around zero. However, in many experiments expression levels must be compared across different slides. It is important to note that individual slides in a multiple slide comparison may need to be adjusted for scale when the different slides have substantially different spreads in their intensity log ratios. Failing to perform a scale normalization could lead to one or more slides having undue weight when averaging log ratios across slides. We can apply the principles used for within-slide print tip group scale normalization to multiple slide scale adjustment.
In practice, the need for scale normalization between slides will be determined empirically. Figure displays box plots of the log ratios for each of the 16 slides in experiment A, after within-print tip group location and scale normalization. The box plots are centered at zero and have fairly similar spreads. In this instance we chose not to adjust for scale, as the noise introduced by a scale normalization of the different slides may be more detrimental than a small difference in scale.