|Home | About | Journals | Submit | Contact Us | Français|
This work identifies statistical algorithms which need to be included in analysis of two-dimensional gels for accurate determination of differential changes. Two-dimensional electrophoresis is a powerful tool for determining differential protein expression in complex mixtures, but the methodology, to date, is not producing expected results due to the degree of gel variability. The new DIGE procedure, comparing two samples in the same gel, does eliminate some of the variability introduced with gel-to-gel comparison, but still has variability due to differences in dye binding, charge, and fluorescence.1 introducing quality-assurance statistical algorithms is necessary to extract meaningful data from the gels. A quality-control analysis of replicate gels needs to be performed prior to using the set in the final analysis. Increasing replicates to five from the usual three can only add greater variability. A statistical “replicate quality” gel test needs to be done on the computer gel scans, and replicates with greater than 20–30% variability should not be used. In addition, since spot intensity data are not normally distributed, spot differential analysis cannot be a t-test. The Studentized Range2,3 has been suggested as a more accurate method for calculating significant difference.
Our laboratory uses Proteomweaver software for analysis, which lets the user select sensitivity and size of spot to be detected. There are many questions about handling spot intensity data. The problems of interpreting the data stem from the electrophoretic methodology (details of method, stain, etc.) and from the choice of statistical methodology. This communication looks at two specific problems of statistical methodology: (1) number and quality of replicates, and (2) appropriate statistical method for analysis.
It is essential to determine “goodness of replicates” before gels are used in the final analysis. While statistical power comes from increased sample size, the variability inherent in two-dimensional (2D) electrophoresis makes data from increasing the number of gel runs less reliable. A three-replicate-per-sample unit will give best results, but gels must be analyzed for “replicate quality” before these gels, or averages of these gels, are used in the final analysis.
Because the number of replicates is small, a per spot t-test cannot be used. Ideally, matched spots should have the same intensity in all the gels, with a quotient of 1. Proteomweaver software includes a replication quality assurance algorithm which uses a confidence level statistic over 500 matching spots in two replicate gels. The algorithm measures the deviation of the quotient from the ideal by converting the data to log normal and using the standard deviation as an exponent. A confidence level from 0.05 to 0.001 can be selected.
If Proteomweaver software is not available, another “goodness of replication” method uses the Studentized Range. The distribution of the quotients of two replicate gels are analyzed in JMP4 statistical software to determine standard deviation. The quotients are divided by the standard deviation. The spots that are significantly different are greater than 2.77 (or less than 0.36) at (α = 0.05) or 3.64 (0.27) at (α = 0.01), the significant difference factors of the Studentized Range. The Studentized Range, Q, is used in multiple-comparison methods. Q is defined as the range of means divided by the estimated standard error of the mean for a set of samples being compared. The estimated standard error of the mean for a group of samples is usually derived from analysis of variance. The two replicates (Figure 1) had 59 spots out of 521 that were significantly different, indicating an 11% variation. Missing data were not included. The researcher must decide the acceptable degree of variation, but certainly greater than 20% would make final data questionable.
Proteomweaver software allows the user to select sensitivity and size of spot to be detected. Best results come when sensitivity selection is clearly above background noise. The software automatically matches without any “landmarking.” Quality of match is identified by color, with green being a excellent match and red indicating the match is very poor. In between, shades move along a spectrum of a mix of red and green, giving a spectrum of tan colors. The gels can be examined in several different views to determine “goodness of match.” In the vector view, green vectors indicate well-matched spots. Red vectors indicate a secondary match. It is possible to manually correct a match, but better data come from using only matches that are predominately on the green end of the spectrum. A red match should not be used.
Gels can be matched to a reference gel, or all gels can be matched to each other. The spot intensity data are transferred to JMP. Using the standard deviation of the distribution of the differences (or quotient) between two gels, the Studentized Range is computed as the difference between spot intensities divided by the standard deviation of this difference. All spots greater than 2.77 (at a significance level of 0.05) are considered to be significantly different. Missing data are not included in this comparison.
In addition, spots missing in one gel are compared to the spot intensity in the match gel using the Studentized Range to determine which spots are significant in their absence.