A number of simulated null datasets were created and subsequently analyzed with DiNAMIC in order to study its behavior under the null hypothesis that no recurrent CNAs are present, and the results of these analyses are presented in . Various marker spacing and correlation schemes were considered in an effort to show that DiNAMIC is robust to the type of deviation from stationarity that can be found in real datasets. A full description of the simulated datasets appears in the Null Simulation Studies section of the Supplementary material. In each case, the observed type I error was computed as follows.
- Create a data matrix Xl using the appropriate simulation scheme.
- Compute
using N = 1000 cyclic shifts of Xl. - Determine whether Tgain(Xl) is significant at the α = 0.05 level.
Steps (1)–(3) were repeated 10 000 times, and the observed type I error was defined to be the proportion of
Tgain(
Xl) that was significant at the α = 0.05 level.
| Table 1.Observed type I error for datasets simulated under the null hypothesis |
The values of the observed type I error given in suggest that DiNAMIC is slightly conservative, which seems reasonable in light of the effect of the cyclic shift procedure on the underlying correlation of the markers. Markers on either side of a breakpoint will be essentially independent, and hence they are more likely to exhibit greater variability than neighboring markers in the original data. As a result, the distribution of the maximum column sum after cyclic shift should yield larger values than the corresponding distribution for the original data, and similarly for the minimum column sums. Because the values in are quite close to 0.05, any difference in the distributions appears to be very minor.
Additional simulations were performed under the alternative hypothesis that a recurrent CNA is present, and a detailed discussion of these simulations can be found in the Power Simulations and Peeling Accuracy section of the Supplementary Material. Briefly, we note that these simulations show that DiNAMIC has equal power to detect gains and losses. Moreover, DiNAMIC's power to detect CNAs increases with the effect size of the aberration. Both of these properties are illustrated by the power curves in .
Next we present the results of the analysis of two publicly available tumor datasets. The dataset of
Natrajan et al. (2006) contains a number of copy number gain and loss loci that are potentially statistically significant. Using both GISTIC and DiNAMIC's Detailed Look, we analyzed a segmented version of this dataset after applying the bias correction scheme described in the
Section 2. Because no normal tissue reference set was available, the thresholds for amplification and deletion, which are required input parameters for GISTIC, were set to the default values of ±0.1. shows all markers that were peeled by DiNAMIC and have either
p(
Tgain(
X)) < 0.025 or
p(
Tloss(
X)) < 0.025 (marked by ‘X’), thereby controlling the overall genome-wide false positive rate (FWER) at α = 0.05. For comparison, we also show all regions detected by GISTIC. By default, GISTIC uses an FDR threshold of
q = 0.25, and in order to facilitate comparison with DiNAMIC, we distinguish between GISTIC findings with
q < 0.05 (marked by ‘X’) from those with 0.05 ≤ q < 0.25 (marked by ‘O’). Note that there are fewer regions declared significant by GISTIC than by DiNAMIC at the respective 0.05 level. For a given error threshold, the FWER is more conservative than the FDR, so this comparison is meaningful.
Natrajan et al. (2006) noted that the most common copy number gains were found in 1q, 8 and 12, with focal gains located at 1q22-25, 8p21-12 and 12p13. Both DiNAMIC and GISTIC detected markers corresponding to these gains. DiNAMIC and GISTIC detected markers at 9q34, the site of the
SET oncogene, which is supported by SET protein amplification findings by
Carlson et al. (1998) in Wilms' tumor.
Natrajan et al. (2006) also found that gains at 13q31 and 16p13 were associated with tumor relapse. Both methods detected 16p13. DiNAMIC's
P-value for the locus in 13q31 is significant at the 0.05 level, whereas GISTIC's
q-value for the locus in the neighboring cytoband 13q32 is not. DiNAMIC's detection of 7q34 and 8q24 is noteworthy because the oncogenes
BRAF and
c-Myc lie in these regions, respectively. Neither of these regions were detected by GISTIC.
Losses at 10p15 and 11p13 were found by
Natrajan et al. (2006) in a number of subjects; these are the sites of
WT1 and
WT2, genes known to be associated with Wilms' tumor. Both loci were detected by DiNAMIC and GISTIC. The same authors concluded that loss of 21q22 was associated with tumor relapse; both methods detected the nearby locus 21q21. Although the loss sites found by the two methods on 1p, 11q and 16q are not identical, the differences appear to be minor. Using linkage analysis,
Rahman et al. (1996) discovered
FWT1/
WT4, a familial Wilms' tumor gene located on 17q12. This site was detected by DiNAMIC but not GISTIC. The gene
PDCD6 is located on 5p15, a site that was found by DiNAMIC but not GISTIC. Because
PDCD6 is known to be associated with programmed cell death, detection of this locus may have biological relevance.
GISTIC and DiNAMIC's Detailed Look were also used to analyze the glioma dataset of
Kotliarov et al. (2006). This dataset contains copy number values from 178 tumors, 82 of which are glioblastomas. As above, GISTIC's amplification and deletion thresholds were set to the default values of ±0.1; the
q-value threshold was 0.05. With these settings, GISTIC found 47 significant gain regions and 20 significant loss regions. Using DiNAMIC, over 100 loci for gains and losses were found to be significant at the α = 0.05 level. The maximum column sum yielded the most aberrant marker, which is marker 55489 in chr7. shows the column sums near the marker, as well as nearby RefSeq genes (hg18 genomic annotation tracks). The highest peak includes
EGFR and a region upstream.
EGFR amplification is a very common genetic mutation in glioblastoma (
Heimberger et al., 2005), and the peak finding is a reassuring illustration of the DiNAMIC procedure.