The example application uses data from the St. Jude AML97 clinical trial (Crews

*et al.*,

2002; Rubnitz

*et al.* 2009). Ross

*et al.* (

2004) used Affymetrix U133A microarrays to measure gene expression in the leukemic cells of diagnostic bone marrow samples of 42 subjects in this trial. Additionally, PK, pharmacodynamic (PD) and clinical endpoint data were collected for these same subjects. Concentrations of ara-CTP (the active form of the chemotherapy drug cytarabine; ara-C) in leukemic cells of the bone marrow were obtained on the first day (CTP1; after ara-C alone) and the second day (CTP2; after combining cladribine with ara-C) of therapy. The rate of DNA synthesis in leukemic cells from the marrow was measured at diagnosis (baseline) and on days 1 and 2 of the therapy. From these measurements of DNA synthesis rates, we computed the log-ratio of the day 1 rate to the baseline rate (DNA1) and the log-ratio of the day 2 rate to the baseline rate (DNA2). The white blood count (WBC) in peripheral blood is a measure of tumor burden. The log-ratio of the WBC after 48 h of therapy to the WBC at the initiation of therapy was determined. Initial response (RESP) was determined by morphologic examination of a bone marrow aspirate collected after completion of the first course of chemotherapy and categorized as no response (RESP = 0), partial response (RESP = 1) or complete response (RESP = 2). Also, event-free survival (EFS) was defined as zero and considered uncensored for patients who did not achieve complete remission after two courses of therapy. For the remaining patients, EFS was defined as the time elapsed from study enrollment to relapse, development of a second malignancy or death, with patients having experienced none of those events censored at the date of last follow-up. For the purposes of PROMISE analysis in this application, the gene expression data were considered as genomic features and the other variables as endpoint variables. All 22 215 probe sets represented on the microarray were included in the analysis. We have previously noted that excluding probe sets on the basis of present–absent calls may be of limited value (Pounds and Cheng,

2005).

The correlation of gene expression with each endpoint can be measured using published statistical methods. The association of expression with CTP1, CTP2, DNA1, DNA2, WBC and RESP is measured with Spearman's correlation coefficient. For this application, these statistics are denoted

*T*_{ctp1},

*T*_{ctp2},

*T*_{dna1},

*T*_{dna2},

*T*_{wbc} and

*T*_{resp}. The association of expression with the risk of relapse and death (EFS) is measured using the rank-based statistic developed by Jung

*et al.* (

2005) that accounts for follow-up and censoring. For this application, this statistic is denoted

*T*_{efs}. The statistic

*T*_{efs} has the form of a dot product and was scaled so that −1≤

*T*_{efs}≤1. For each endpoint, the association statistics were computed using all subjects with pairwise complete data (i.e. having gene expression data and data for the specific endpoint). This technique for managing missing data allows us to use all available data. The same approach was used for computing permutation statistics.

All AML97 subjects were randomly assigned to receive one of two infusion schedules for ara-C during the initial course of therapy. An amendment to the study protocol added one dose of intrathecally delivered ara-C before the first course of intravenous chemotherapy. Thus, each patient received one of four distinct therapies (short infusion or continuous infusion of ara-C with or without intrathecal ara-C). The statistical analyses of the association of gene expression with the PK endpoints (CTP1 and CTP2), PD endpoints (DNA1, DNA2 and WBC) and clinical endpoints (RESP and EFS) must account for the different therapeutic strategies. Therefore, for each endpoint and expression probe set, the correlation with expression was computed separately for each of four therapy-defined groups of subjects, and then the final correlation statistic was the sample-size weighted average of the group-specific correlations. This type of adjustment for therapy is called a *stratified analysis*, and the therapy-defined groups are called *strata* (or one group is called a *stratum*).

In this application, prior biological and clinical knowledge was used to define the most interesting result for the association of gene expression with the seven endpoints. First, it is most interesting if the correlation of expression with CTP1 and CTP2 are both equal to ±1. For purposes of constructing the vector

*d*, let

*d*_{ctp1}=

*d*_{ctp2}=1. Given this selection for

*d*_{ctp1} and

*d*_{ctp2}, the most interesting correlation of expression with DNA1 and DNA2 is

*d*_{dna1}=

*d*_{dna2}=−1, because the PD effect of ara-CTP is to interfere with DNA synthesis. Interference with DNA synthesis results in cell death and therefore leads to a reduction in WBC. Thus,

*d*_{wbc}=−1. Increased levels of ara−C in leukemic blasts should reduce the amount of tumor in the marrow (a better tumor response), and therefore

*d*_{resp}=1. Effective therapy should reduce the risk of relapse and death, thus

*d*_{efs}=−1. Therefore, by (

2), the PROMISE statistic for this application was defined as

The subscript

*g* is omitted from the right-hand side of (

4) for simplicity of notation. The PROMISE statistic is scaled by 1/7 so that −1≤

*R*_{g}≤1. A positive

*R*_{g} indicates that the expression of probe set

*g* shows a therapeutically beneficial pattern of correlation with the endpoint variables, i.e. higher expression associates with therapeutically desirable values of the endpoint variables. Similarly, negative

*R*_{g} indicates that the expression of probe set

*g* shows a therapeutically detrimental pattern of association with the endpoint variables.

The statistical significance of each individual endpoint's association statistic and the PROMISE statistic were determined using the same set of 10 000 permutations of the assignment of expression data to endpoint data. The permutations were restricted so that data reassignments were performed separately within each therapy-defined stratum because the differences in therapy are important factors for ara-C pharmacology and clinical outcome, as previously described (Rubnitz

*et al.*,

2009). For each gene

*g*, the

*P*-value was computed by letting ρ

_{g}^{0}=0 in Equation (

3). Several interesting biological findings will be reported in detail elsewhere. Here, we describe a few results that illustrate the advantages of PROMISE.

The results for the human equilibrative nucleoside transporter 1 (hENT1) gene (probe set 201802_at) clearly illustrate the interpretative advantage of PROMISE. hENT1 is a solute carrier that brings ara-C into the cell (Hubeek

*et al.*,

2005). Given this role in ara-C metabolism, one would expect hENT1 to show a therapeutically beneficial pattern of association. This was indeed the case ( and ). The expression of hENT1 was positively associated with CTP1, positively associated with CTP2, negatively associated with DNA1, negatively associated with DNA2, negatively associated with WBC, positively associated with RESP and negatively associated with the risk of an EFS event (). However, it is difficult to interpret the statistical significance of the association pattern with the seven individual

*P*-values. The PROMISE analysis indicated that the beneficial pattern of association was very significant (

*R*_{g}=0.23,

*P*=0.0033, rank = 225). Thus, PROMISE identified this gene of known relevance to ara-C metabolism and provided a straightforward interpretation of statistical significance for the interesting pattern of associations with the endpoint variables. The individual endpoint analyses also provided insights that may be helpful for biological interpretation of the results. The associations with DNA1, DNA2 and RESP were strong contributors to the final value of the PROMISE statistic (correlations from 0.30 to 0.40). The associations with CTP1 and CTP2 were moderate contributors, and the associations with WBC and EFS were relatively weak contributors. Other genomic features had similar PROMISE statistics as hENT1, but for some of those features the associations with individual endpoints were substantially different.

| **Table 1.**The association statistic, *P*-value and rank among 22 215 probe sets (by *P*-value) for hENT1 from each individual endpoint analysis and the PROMISE analysis |

In our application, PROMISE clearly had greater power to identify genes with interesting patterns of association. First, PROMISE identified a substantially greater proportion of all genes as significant than did any pairwise overlap analysis (A). In the PROMISE analysis, 498 probe sets were significant at *P*=0.01 level. In contrast, only 92 probe sets were significant at *P*=0.01 for DNA1 and DNA2, the greatest number of overlapping probe sets for any pair of individual endpoint analyses performed at the *P*=0.01 level. By definition, overlap of three or more endpoints detected fewer probe sets at the *P*=0.01 level. No probe set was significant in all seven individual endpoint analyses at a *P*<0.15. Second, PROMISE identified a greater proportion of genes with an interesting pattern of association than did any individual endpoint analysis (B). For 719 probe sets, the statistics measuring the association of expression with the individual endpoints all had signs matching (or all had signs mismatching) those of the interesting result vector *d*. PROMISE identified 154 of these probe sets as significant at *P*=0.01. The individual endpoint analysis for DNA2 detected 34 of the probe sets with an interesting pattern of association at the *P*=0.01 level, the best among any individual endpoint analysis.

A permutation-based GSEA was also performed for each individual endpoint analysis and the PROMISE analysis. The pathway column of the Affymetrix annotation dataset was used to define 233 gene sets. For each gene set, the gene-set enrichment statistic was defined as the average of the absolute value of the member genes' correlation statistics. PROMISE identified 42 of these gene sets as significant at the *P*=0.05 level. Biologically interesting gene sets with significant PROMISE results included the DNA replication reactome gene set (*P*=0.0248) and the gene set for the G1-to-S phase of the cell cycle (*P*=0.0113). These gene sets are very interesting in this application because ara-C interferes with DNA synthesis (or replication), and clearly the cell cycle is very important in cancer. These two gene sets were significant at the *P*=0.05 level in the individual endpoint analysis for DNA1, but not in any other individual endpoint analysis. No gene set was significant in all seven individual endpoint analyses at *P*<0.15. Thus, GSEA based on PROMISE identified more gene sets than did searching for overlap among the results of gene-set analyses for individual endpoints.