First, we tested the algorithms on simulated data of various abnormality widths and noise levels. To generate ROC curves corresponding to a particular aberration width and noise level, we calculated the true positive rates (TPR) and the false positive rates (FPR) as we varied the threshold for determining an aberration. We also tested the algorithms on cDNA microarray data containing measurements from 26 different primary Glioblastoma Multiforme (GBM) tumors (
Bredel et al., 2005).
Simulated Data
We calculated the ROC profiles of each algorithm for aberration widths of 5, 10, 20, and 40 probes and signal-to-noise ratios (SNR) of 1, 2, 3, and 4. SNR was defined as the mean magnitude of the aberration (i.e., signal) divided by the standard deviation of the superimposed Gaussian noise. illustrates the kind of profiles examined in this simulation. For each aberration width and SNR, we generated 100 artificial chromosomes, each consisting of 100 probes and with the square-wave signal profile added to the center of the chromosome. The performance of the algorithms for the aberrations at the boundaries was not examined here.
TPR was defined as the number of probes inside the aberration whose fitted values are above the threshold level divided by the number of probes in the aberration. FPR was defined as the number of probes outside the aberration whose fitted values are above the threshold level divided by the total number of probes outside the aberration. In order to compute the ROC curve, we varied the threshold value for aberration from the minimum log-ratio value to the maximum. (This is equivalent to moving the x-axis cutoff value in a mixture distribution.) Each threshold value results in a TPR and a FPR, represented by a point on the ROC curve. A set of TPRs and FPRs were then plotted to reveal the algorithm's ROC profile for the particular aberration width and SNR (see ). We also computed confidence intervals around each ROC curve (not shown) but there did not appear to be significant differences among the methods.
We note that TPR and FPR are informative in understanding how an algorithm performs in estimating the boundary of the altered region. When the algorithm over-estimates the boundary, FPR increases while TPR remains fixed; when it under-estimates the boundary, TPR decreases while FPR remains fixed. We also note that FPR estimates depend on the size of the aberration relative to that of the chromosome. Therefore, FPR should be used only for measuring relative performance among different methods given a fixed aberration size.
The default parameters in the software were used except in the following cases where the parameters had to be chosen by the user. For quantile smoothing, we followed the suggestion of
Eilers and de Menezes (2005) to use two-fold cross-validation to estimate the value of
λ that minimizes the overall penalty term; this gave us a
λ of 1.5. For the wavelet denoising algorithm, we chose soft Stein's Unbiased Risk Estimate thresholding with a maximum wavelet coefficient level of 3 based on results given in their paper. For lowess, we used a smoothing window of 10 probes and defined the smoothing span as the size of the smoothing window divided by the number of non-missing log-ratios in the chromosome. For the genetic algorithm (
Jong et al., 2003), we deselected the option to filter out log-ratios with a threshold value of 0.75 so that their program will consider all data points in the analysis. For Analysis of Copy Errors (
Lingjaerde et al., 2005), we chose results for the estimated false discovery rate closest to 0.001.
As shown in , most algorithms did well in the case of detecting the existence and the width of aberrations for the large changes and high SNRs (upper left panels). For the cases of smaller aberrations and low SNRs, the smoothing methods (i.e., wavelets, lowess, quantile regression) gave better detection results (higher TPR and lower FPR) than other methods. The smoothing algorithms followed low amplitude and local trends in the data better than the other algorithms which were less sensitive to such features.
Among the methods that perform estimation (indicated by ‘E’ in ), the homoscedastic algorithm in
Picard et al. (2005) appeared to perform better than other methods. However, none of the algorithms reliably detected the aberrations with small width and low SNR because the signal is too weak to differentiate from the noise. Compared to other algorithms, the CLAC algorithm tended to overestimate the boundaries of the aberrations. The mean smoothing step in CLAC appeared to reduce noise in the artificial chromosome data at the expense of blurring the edges of the boundaries. The ChARM algorithm did not detect the presence of aberrations in every artificial chromosome even in the large width, high SNR case, irrespective of the cutoff p-values for the mean and sign tests. We suspect that this may be due to the fact that its boundary detection step is local and decoupled from the overall estimation.
| Table 1List of algorithms tested in this paper. For the last column, ‘S’ and ‘E’ indicate that the algorithm has a step for smoothing and estimation, respectively. Three methods (Quantreg, Wavelet, and Lowess) are for smoothing (more ...) |
Glioblastoma Multiforme (GBM) Data
There are 26 samples representing primary GBMs in the glioma data from
Bredel et al. (2005). GBM is a particularly malignant type of brain tumor, with a median patient survival time of a year. The samples were co-hybridized with pooled human controls onto custom spotted cDNA microarrays. The scanned raw data were downloaded from the Stanford Microarray Database (
http://smd.stanford.edu). For the purpose of this paper, the array data were normalized by print-tip group, intensity-dependent normalization with the Limma package (
Smyth, 2004). Of the 41,421 elements on each array, we were able to link 33,599 of them to chromosomal positions using mapped EST data from the hg16 build of the UCSC Genome Browser (
http://genome.ucsc.edu). Missing values in each array were removed to avoid the effect of imputed values in subsequent analyses.
Though noisy (standard deviation of the log-ratios for each array ranges from 0.35 to 0.9), the GBM data contained a mixture of larger, low amplitude regions of gains/losses and smaller, high amplitude regions of amplifications/deletions. These types of copy number alterations represent the types of aberrations the array CGH algorithms should detect. Two examples representing a broad, low amplitude change and a smaller, high amplitude one are examined in the following paragraphs.
Numerous regions of gains/losses have been found in many microarray studies on gliomas (
Koschny et al., 2002). For instance, gain of chromosome 7 and losses of chromosomes 10 and of large portions of 13 and 22 have been observed in GBMs previously. These gains and losses may be the effect of uncontrolled mitotic events from point mutations of oncogenes and tumor-suppressing genes. In the sample GBM31, there exists a large region of loss on chromosome 13. The overall magnitude of the loss is very low because not all tumor cells in a given sample have the same types of gains and losses. It may also be due to the presence of connective tissues and other non-tumor cells in the sample. As a consequence of sample heterogeneity, the signal is diluted, thus complicating the detection procedure for the algorithms.
As can be seen in , most algorithms in the study detected the proximal loss of chromosome 13 of GBM31. CGHseg, GLAD, CBS, GA all clearly identified nearly identical regions. All three smoothing algorithms showed the same general trend but the global loss was obscured by the local features. These algorithms performed well in detecting the smaller aberrations in the simulated data, but they were not as useful for a global view. HMM did not separate chromosome 13 into two regions. In addition to detecting the loss, GLAD identified numerous single-probe outliers. Such outliers can either indicate a real focal aberration, some type of polymorphism, or an experimental artifact (e.g., bad probe). CLAC and ACE detected the region of loss as a series of smaller losses. Many smaller regions within chromosome 13 that CLAC and ACE did not detect as losses coincided with localized positive spikes in log-ratios.
The GBM data also contained numerous amplifications. Several amplifications, such as those around PDGFRA, CDK4, and MDM2, have been well-studied in GBMs (
Kraus et al., 2002). The amplification at the EGFR locus has been implicated in other tumors, and it is clearly present in GBMs, as shown in . In this GBM29 sample, there appeared to be at least three high amplitude amplifications around EGFR. The algorithms CGHseg, quantreg, GLAD, wavelet, and GA detected all three high amplifications. Because there are only 4 probes separating the first two amplifications, methods such as CLAC, CBS, Lowess, and ACE combined the first two amplifications together. It is possible that these two amplifications were in fact a single one, but mapping the probes to their physical positions suggested that they are likely to be two separate aberrations.
CLAC, ACE, and ChARM all use mean smoothing as an initial step to denoise the data. Mean smoothing increases SNR at the cost of blurring the edges of the boundaries. Because of the blurring, CLAC detected the amplifications as two, larger adjacent amplifications. ACE does not merge the amplifications the way CLAC did, as it has an additional step to compensate for the blurring in identifying the boundaries. More sophisticated smoothing methods appear to perform better in general.
ChARM detected the three amplifications as one large region of gain. However, when each region was manually tested for significance in their software, all regions were marked as significant by their mean and sign tests. This indicated that the boundary detection part should be improved. HMM did not detect any of the three amplifications, even though it detected smaller regions in the simulated data. Singular matrices in the iterations were often the source of problems.