We have described a new method for analyzing tiling arrays which generalizes the moving average approach to scenarios where a problem can be presented as a hypothesis test at each probe and a moving “average” can be applied to the p-values resulting from this test. Our method was applied to data for the Ci and Pros transcription factors, where we had information on both their target genes and binding sites.
The first set of results based on a t-test were used to compare the performance of the method to the standard moving average approaches where the moving average of a log-ratio or t-statistic can be evaluated (Sections 4.1–4.2). In general, we found that the results were comparable between the methods on the three metrics used. Since the t-statistic moving average approach is a special case that falls into the generalized moving average framework, it is reassuring that the results were consistent. Furthermore, for large sample sizes, the Stouffer-Lipták test should be equivalent to the standard moving average approach.
The generalized moving average generally performed at least as well as the existing moving average methods, with some variation in the best performer due to the data set and evaluation criteria. As in Kuan et al. (2008)
, an adjustment was made based on correlations between probes and our results (–) are consistent with previous work that showed improvement in predictions due to the dependence adjustments (Bourgon, 2006
, Kuan et al., 2008
). TileMap, which only accounts for first order dependencies, performs more similarly to the unadjusted methods than the dependence adjusted methods.
The ER predictions were compared using several evaluation metrics (target gene enrichment, motif enrichment and distance to gene). These types of information are useful for investigators to evaluate the quality of ER predictions. However, there are caveats regarding their use. First, the consensus is based on current knowledge and may not completely specify the binding specificity and second, the sets of target genes are likely to suffer from both false positives and false negatives. This may be especially true for the Pros results which are strikingly worse than the Ci results. However, the Pros results may also be due to to a relatively poorer enrichment signal or considerably higher auto correlation, which may make prediction difficult. Despite the inherent problems with the different benchmarks, using the combination of the three can still be useful for comparing methods.
The initial set of results can also be used to evaluate how different cutoffs and options affect the results. As expected, the more stringent FDR control cutoffs provided more accurate ER predictions based on the metrics. However, there are two caveats regarding these cutoffs. First, many moving averages methods including those applied here use Benjamini & Hochberg FDR control assuming tests are independent despite tests being locally correlated. The effect of correlation may make FDR estimation unstable and little is known about how to account for correlation (Efron, 2007
, Schwartzman and Liny, 2009
). Second, the FDR cutoffs are at the probe-level not the ER-level, which is also common for many tiling array analysis methods. It would be more ideal for investigators to have FDR cutoffs that correspond to the ER-level test.
We also evaluated the effect of the window size option. Windows sizes of 3–8 were selected to correspond to a typical DamID fragment length of 1800–4800. The combined p-value methods appear to be more sensitive to window length. Larger window sizes result in the prediction of longer ERs in general and better target gene enrichment, but worse motif enrichment and gene distance scores. Intermediate window sizes in the range 4–5 appear to balance these two effects.
Finally, two different versions of the combined p-value methods were introduced; Fisher’s Combined Probability Test and Stouffer-Lipták Test. For the most part, there appears to be an advantage of FwD compared to SLwD on smaller window sizes based on the evaluation metrics. In general, Fisher’s Combined Probability Test puts relatively more emphasis on small p-values than the Stouffer-Lipták Test (Loughin, 2004
). This would be more of a problem when the p-values that are combined are very disparate in magnitude, as in the analysis of the two Ci data sets in Section 4.3.
Although t-statistics were used in the first set of results, the use of the generalized moving average need not depend on that specific type of test as long as the problem can be expressed in a hypothesis testing framework and p-values can be determined for each probe. In the second set of results (Section 4.3), we presented another application to illustrate this point and the generalized moving average with dependence adjustment could be applied across probes. Using a standard moving average in this scenario may not be appropriate, since the moderated t-statistics at each probe from the two different experiments (CiA and CiR) have different degrees of freedom and it is of interest to identify regions where there is enrichment of at least one or both signals. Alternatively, TileMap provides the flexibility to make comparisons under multiple experimental methods (e.g., mutant 1 < wild type < mutant 2), but in this application, it was not easily adapted to test for binding under the two conditions (personal communication). In this context, the generalized moving average method provided an alternative analysis method with predictions near genes that are consistent with annotation for known Hh targets.