Search tips
Search criteria 


Logo of jcoHomeThis ArticleSearchSubmitASCO JCO Homepage
J Clin Oncol. 2008 September 20; 26(27): 4376–4384.
PMCID: PMC2736991

Gene Expression Signatures Predictive of Early Response and Outcome in High-Risk Childhood Acute Lymphoblastic Leukemia: A Children's Oncology Group Study



To identify children with acute lymphoblastic leukemia (ALL) at initial diagnosis who are at risk for inferior response to therapy by using molecular signatures.

Patients and Methods

Gene expression profiles were generated from bone marrow blasts at initial diagnosis from a cohort of 99 children with National Cancer Institute–defined high-risk ALL who were treated uniformly on the Children's Oncology Group (COG) 1961 study. For prediction of early response, genes that correlated to marrow status on day 7 were identified on a training set and were validated on a test set. An additional signature was correlated with long-term outcome, and the predictive models were validated on three large, independent patient cohorts.


We identified a 24–probe set signature that was highly predictive of day 7 marrow status on the test set (P = .0061). Pathways were identified that may play a role in early blast regression. We have also identified a 47–probe set signature (which represents 41 unique genes) that was predictive of long-term outcome in our data set as well as three large independent data sets of patients with childhood ALL who were treated on different protocols. However, we did not find sufficient evidence for the added significance of these genes and the derived predictive models when other known prognostic features, such as age, WBC, and karyotype, were included in a multivariate analysis.


Genes and pathways that play a role in early blast regression may identify patients who may be at risk for inferior responses to treatment. A fully validated predictive gene expression signature was defined for high-risk ALL that provided insight into the biologic mechanisms of treatment failure.


The current management of children with acute lymphoblastic leukemia (ALL) modulates treatment intensity according to the risk of relapse, which thereby maximizes opportunities for cure and minimizes adverse effects.1

A number of variables have been shown to be predictive of outcome in childhood ALL, including clinical and laboratory features, cytogenetic characteristics of the blast, and early response to chemotherapy.2 These variables are routinely used for treatment assignment, but approximately 20% of children unpredictably suffer a relapse.3

Global gene expression profiling has facilitated the discovery of biologic subgroups in a variety of cancers.4,5 This technique has been shown to accurately classify ALL into cohorts that correspond to known biologic subgroups.6,7 However, it has proved more difficult to identify signatures that are globally predictive of outcome. In the present study, we performed gene expression profiles on leukemic blasts from children who were treated on a single, contemporary Children's Oncology Group (COG) protocol for high-risk (HR) ALL to discover gene expression signatures that are predictive of early response and outcome.


Diagnostic marrow samples from 99 children (age 1 to 18 years) with National Cancer Institute–defined HR B-precursor ALL (age ≥10 years and/or presenting WBC ≥ 50,000/μL) who were treated on the COG 1961 protocol were analyzed.8 We focused on this particular group of patients, because many lack known genetic subtypes predictive of outcome. All patients received a standard four-drug induction and were further classified as slow early responders (SER)—day 7 marrow was M3 (> 25% blasts)—or rapid early responders (RER)—day 7 marrow was M1 (< 5% blasts) or M2 (5% to 25% blasts).

To determine genetic profiles associated with early response to therapy, we analyzed 82 of 99 patients: 42 patients who had M1 marrow on day 7 were compared with 40 patients who had M3 marrow on day 7. Patients with M2 marrow (n = 17) were excluded to maximize the distinction between responders. To study the genes associated with long-term outcome, we analyzed expression profiles of 59 patients who fulfilled the following criteria: 28 patients who remained in complete continuous remission (CCR) for at least 4 years and 31 patients with marrow relapse within the first 3 years of initial diagnosis. Forty-two samples were common to both the early response and outcome analyses. Patient characteristics are listed in Appendix Table A1 (online only).

RNA Extraction and Amplification and DNA Arrays

Total RNA was extracted from cryopreserved blasts from the COG cell bank by using RNeasy Midi kits (Qiagen, Valencia, CA) followed by the MinEluate kit (Qiagen). Fifty nanograms of total RNA were used as template in a double-amplification protocol by using the RiboAmp OA kit (Arcturus, Mountain View, CA) according to the manufacturer's recommendations. In vitro transcription was completed with biotinylated UTP and CTP for labeling by using the Enzo BioArray HighYield RNA Transcript Labeling kit (Enzo Diagnostics, Farmingdale, NJ). Twenty micrograms of labeled cRNA were fragmented and hybridized to Affymetrix U133Plus2.0 microarrays (Affymetrix, Santa Clara, CA). These arrays contain 54,675 probe sets, which represented approximately 38,500 genes.

Screening Analysis for Cytogenetic Risk Group

Patients were tested by reverse transcriptase polymerase chain reaction (RT-PCR) for the presence of each of four common prognostic translocations: t(1;19), t(4;11), t(9;22), and t(12;21). The t(1;19), t(4;11), and t(12;21) fusion products were assayed by qualitative RT-PCR, whereas the t(9;22) analysis was done quantitatively by using TaqMan technology (Applied Biosystems, Foster City, CA). Primers are listed in Appendix Table A2 and methods for the assays detailed in the Appendix (online only).

Data Analysis

Data generated from the COG 1961 samples discussed in this publication have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus ( and are accessible through series accession number GSE7440.

Gene expression values were generated by using Affymetrix MAS 5.0 Software. Expression levels were scaled to an average value of 1,000 per gene chip9 and were log transformed. In each analysis, the probe sets of 53 nonhuman genes and those that did not receive present calls in at least 30% of the samples were removed from the study.

For prediction of early response, the samples (n = 82) were randomly divided into a training set (28 RER, 26 SER) and a test set (14 RER, 14 SER). A nearest shrunken centroids prediction model with a subset of genes that were best associated with early response (RER v SER) was determined by utilizing Prediction Analysis of Microarrays (PAM)10 packaged in R (Stanford University Labs, Palo Alto, CA; with a 200 × 10-fold cross validation procedure on the training data set. This model was used to make predictions on the test set. Logistic regression was utilized to test the significances of the subset of genes and the class predictor when analysis was adjusted for clinical covariates, such as age and presenting WBC.

For long-term outcome prediction, t test and adjusted P value (or false discovery rate [FDR]), as proposed by Benjamini and Hochberg,11 were utilized to select a subset of probe sets that were statistically associated with outcome. A logistic regression12 model was used to test whether each of the genes added prognostic value beyond that of known clinical covariates. Logistic regression with various variable selection options11 was utilized to build the best models for predicting outcome on the basis of clinical covariates and the genes identified by the t test with the adjusted P value (or FDR). Prediction accuracies of these models were estimated by using an unbiased, leave-one-out cross validation (LOOCV).

Three independent microarray data sets of childhood B-precursor ALL were used for validation of the outcome signature: a set of 220 patients treated on Pediatric Oncology Group (POG) trials,13 145 patients treated on German Cooperative Study Group for Childhood ALL (COALL) protocols,14 and 92 patients from Dutch Childhood Oncology Group (DCOG) protocols.15 The samples of the POG, COALL, and DCOG data sets were hybridized to Affymetrix U95Av2, U133A, and U133Plus2.0 arrays, respectively. Logistic regression was used to determine the association of the significant probe sets in the POG data set, and Cox regression was used in the COALL and DCOG data sets. We next built models for outcome prediction. Of the 47 probe sets identified in the COG 1961 data set, 18 could be matched by 20 probe sets of the U95Av2 microarrays. We constructed logistic regression models with these 20 probe sets and with three clinical covariates (sex, age, and WBC) as predictors of outcome. Briefly, model I (LP1) was based on three genes, model II (LP2) on five genes, and model III (LP3) on four genes. Receiver operating characteristic (ROC) accuracy, t test, Mann-Whitney U test, Cox proportional hazards regression, and logistic regression were used to validate these predictive models on the independent data sets. In addition to the three statistical models mentioned above, we considered a simple linear combination of the expression values of the probe sets that match the 47 probe sets in each of the three validation cohorts (LPV).


Prediction of Early Response

Analysis with PAM on the training set (n = 54) led to a model comprised of 24 probe sets with a minimal average cross validated error rate of 0.38 that best characterized early response (FDR, 3.6%). The Affymetrix probe set identifications and gene descriptions in rank order can be found in Table 1.

Table 1.
Significant Probe Sets Predictive of Early Response

To validate the significance of the 24 probe sets, we performed t test and logistic regression analyses on the expression values in the test data set (n = 28). Although there was a positive trend of association between all probes in the test and training sets, eight reached statistical significance. The estimated ROC accuracy of the predicted score on the test set was 0.7755 (P = .0061; Fig 1). The overall misclassification rate was 0.25 (sensitivity = .7143 and specificity = .7857). The observed and predicted early responses significantly correlated with each other (odds ratio, 8.33; P (one-sided Fisher's exact test) = .011).

Fig 1.
Receiver operating characteristic (ROC) curve of the predicted score of early response on the test set. The model that comprised 24 probe sets that were derived from the training set was used for the prediction of early response on the test set. The ROC ...

Functional Analysis of Genes Related to Early Response

A list of 188 differentially expressed probe sets (RER v SER) that were selected by PAM on the entire data set (N = 82; FDR ≤ 10%) was used for the detection of the relative enrichment of genes according to GeneOntology15a terms with the help of the L2L tool.16 Genes significantly over-represented in RER patients included those involved with induction of apoptosis and hematopoeitic development, whereas genes involved with cell growth and metabolism were over-represented in SER patients (Table 2).

Table 2.
Enrichment Analysis of Genes Associated With Early Response

Prediction of Long-Term Outcome

Gene expression profiles from 59 patients (28 CCR; 31 relapse) were analyzed to identify genes related to long-term outcome. By using a threshold FDR of 5%, we identified 47 probe sets (which represented 41 unique genes) that were significantly associated with outcome. The Affymetrix identifications, which are descriptions for the genes, are listed in rank order in Table 3 with t test P values that compare CCR and relapse on the selected genes. Figure 2 represents the heatmap of the expression values.

Fig 2.
Genes differentially expressed in patients that remained in complete continuous remission v in those that relapsed. Heatmap of the 47–probe set signature that was predictive of outcome (which represented 41 unique genes).
Table 3.
Probe Sets Differentially Expressed Between Patients in CCR and Those Who Experienced Relapse

To get an unbiased estimate for the prediction accuracy of each of the three models (LP1 through 3), we performed LOOCV. The misclassification rates for the three models were 0.2542, 0.3051, and 0.2881, respectively (sensitivity = 0.643, 0.642, and 0.643, respectively; specificity = 0.839, 0.742, and 0.774, respectively). The ROC accuracies were 0.8065, 0.7154, and 0.7316 (P < .0001, < .002, and < .001, respectively). These LOOCV results indicated that the three models were significantly predictive of outcome.

Validation of Outcome Prediction Models on Independent Patient Cohorts

Three large patient cohorts—POG, COALL, and DCOG—were used as independent sets for validation of the 47–probe set signature. Notably, the trend in the DCOG and COALL data sets of the association of the matched probe sets all agree with that observed in the 1961 data set, and this was also true of the POG data set with three exceptions (Appendix Table A2, online only).

The POG data set consisted of 220 patient cases (4-year CCR, n = 95; relapse, n = 125) of childhood B-precursor ALL. The estimated ROC accuracies for the three prediction models were 0.6119, 0.5820, and 0.5674, respectively. By using the one-sided Mann-Whitney U test, the P values were .00226, .0187, and .0436, respectively, which indicated that each of the predicted LP values of the three models were significantly predictive of outcome in the independent POG set. To further validate the predictive value of the three models, we fit the univariate and multivariate logistic regression models (Table 4). The LPs of all the three models were significantly associated with outcome (P < .05 for all). However, we did not find statistical evidence for the prognostic significance of the majority of the models when analysis was adjusted for age and WBC or for karyotype. Only model I retained prognostic significance when age, WBC, and karyotype were considered. Models II and III were significant after analysis was adjusted for karyotype but not for age and WBC. Similar results were obtained when only the HR subset of patients was analyzed (data not shown). Logistic regression with LPV (ie, the weighted sum of expression values of 20 probe sets common in the COG 1961 and POG datasets) as the explanatory variable indicated that LPV also was associated with a good outcome; P (one-tailed Wald test) = .007.

Table 4.
Validation of the Outcome Signature on POG Data Set

We next validated the three prediction models by using COALL data with Cox proportional hazards regression (Table 5). We again noted that the predicted LP values of all three models were significantly associated with outcome (P < .05), and they remained significant after analysis was adjusted for age and WBC but not for karyotype. Cox regression with LPV was significantly associated with outcome (P = .0002). The DCOG data set comprised of 92 (4-year CCR, n = 67; relapse, n = 25) B-lineage diagnostic samples. By using the one-sided Wilcoxon rank sum test, the P values were .030, .020, and .0635, respectively, which provided a significant or marginal association with outcome. Cox PH regression was performed to additionally validate the association of the predicted values with outcome (Table 6). We noted again that the hazard ratios were all less than 1, which indicated a consistent trend that the high predicted values were associated with good outcome. In the DCOG data set, the three models were statistically significant when considered on their own (univariate analysis) but were not after analysis was adjusted for WBC, age, and karyotype. Logistic regression yielded similar results (Appendix Table A3, online only). LPV with all 47 probe sets was significant (P = .02, Appendix Table A5, online only).

Table 5.
Validation of the Outcome Signature on COALL Data Set
Table 6.
Validation of the Outcome Signature on DCOG Data Set


The goal of our study was to identify gene expression signatures in diagnostic samples that are predictive of early response to therapy and overall outcome in children with National Cancer Institute–defined HR ALL. All samples studied in these experiments were from patients who were treated on a single, contemporary protocol and who received intensified therapy according to a COG-modified Berlin-Frankfurt-Munster backbone, which thus minimized the effects of treatment variables.

Early response to therapy has proven to be one of the strongest predictors of outcome and now is routinely used to stratify patients according to the risk of relapse.17 We were able to identify and validate a gene expression signature that correlated with the kinetics of regression of tumor burden, as assessed by the bone marrow blast content on day 7. Apoptosis-facilitating genes, such as BIM and PDCD6, were upregulated in RER patients, whereas multiple genes involved in cell adhesion (eg, GPR56, PARVG), cell proliferation (eg, CKLF, BMP2), and antiapoptosis (eg, BCL2, SOCS2) were upregulated in SER patients. If this signature is validated with additional research, more rapid approaches to assessment of gene expression could be used so that augmented therapy might be deployed early—within the first few days of diagnosis—to overcome slow response and possibly the emergence of drug-resistant clones and, ultimately, to improve outcome.

Other investigators also have sought to identify gene expression profiles associated with early response to therapy. Two recent publications from Flotho et al18,19 have portrayed signatures that correlated with minimal residual disease at day 1918 and at day 4619 of induction. Though only five of 44 probe sets from the day 19 signature reached statistical significance in our data set of day 7 response, the trend of association for all the probe sets was remarkably strong. Not surprisingly, this trend was not observed with the day 46 signature (data not shown). Previous studies show that the kinetics of blast reduction is quite steep in the first 2 weeks of induction and is much slower thereafter.20 Thus, although day 7 bone marrow morphology and end induction minimal residual disease may correlate,21 it is likely that fundamental differences exist in the mechanisms of leukemia cell death that occurs in early compared with late induction.

Though various groups have performed microarray experiments on childhood ALL samples, it has proved difficult to identify a prognostic signature at diagnosis. For example, Yeoh et al7 were able to detect distinct expression profiles that predicted relapse in T-cell acute lymphoblastic leukemia and hyperdiploid ALL but not in other subtypes.7 Although expression of OPAL1 predicts ALL in some studies, it has not been validated in others, which suggests that differences in treatment may influence the prognostic impact of expression profiles.22 Other investigators have correlated gene expression signatures with in vitro drug response.14,23 However, this drug resistance profile was not selected for its prognostic value and, hence, may not represent the best selection of outcome-predictive genes. Despite these challenges, we have identified a gene expression signature that was predictive of long-term outcome and was validated in three independent cohorts of diagnostic samples from children who were treated on different protocols, which thus yielded an accurate perspective on the validity and reproducibility of the results.

Almost all of the genes that comprised our predictive signature were not identified in the studies mentioned above that looked at drug resistance and/or outcome. However, studies that have used microarray methodology to discover predictive signatures in other cancers also have shown little overlap in gene lists. Although these gene lists may not always be concordant between data sets, each signature still may be significantly predictive across the data sets. For example, five recently published predictive gene sets for outcome in breast cancer showed little overlap between sets.24 However, four of five were predictive of outcome in a single data set of 295 women, which emphasizes that, despite the lack of overlap, the signatures are reflective of common biologic subsets. This is consistent with our findings that demonstrated the ability of individual gene expression signatures and the derived models by using the COG samples to predict outcome on three different cohorts of patients.

The utilization of predictive signatures in clinical cancer trials is eagerly awaited. The application of array technology to define additional patients with ALL who have a poor outcome may be more difficult given the high cure rate of ALL and the elucidation of many well-established risk factors to date. One of the most crucial findings of our study was that, although gene expression signatures correlated with outcome in univariate analyses in multiple data sets, they lost much significance when well-known outcome predictors, like age, initial WBC, and genotype, were taken into account. A logical interpretation of these findings is that the most important variables associated with treatment failure in ALL have been identified already. However, the inability to accurately predict outcome uniformly by using these conventional variables may be related in many instances to host factors. In addition, measurements of gene expression do not take into account important events, such as post-translational modifications. Another explanation is that prognostic signatures may exist within biologic subtypes of ALL only. It has been established that gene expression profiles correlate with ALL cohorts defined by molecular changes, such as translocations and ploidy. We specifically focused our efforts on National Cancer Institute–defined HR ALL, because known genetic subtypes account for only a minority of patients in this cohort, and we sought to identify novel biologic subtypes associated with outcome by using gene expression profiling. Our inability to define such a group might reflect the existence of smaller biologic subsets within this population that may not be possible to detect with the number of patient cases studied here. However, our study and similar ones by others, even if not predictive in multivariate analysis, are likely to lead to a biologic understanding of why certain clinical and laboratory variables are associated with clinical outcome. Such information is essential to derive more effective, tumor-specific therapies.

In summary, we have identified a gene expression signature that is significantly predictive of outcome in childhood ALL, but it does not seem to provide additional information beyond that contained in already established prognostic variables. The analysis of a larger number of samples may allow investigators to discover gene signatures that provide additional prognostic information. Strict adherence to uniform protocols for sample acquisition, processing, and array experimentation may facilitate comparison between data sets.25 In addition, analysis of gene expression profiles may lead to a biologic understanding of why clinical and laboratory variables are associated with outcome, and this information potentially may be exploited therapeutically.


The authors indicated no potential conflicts of interest.


Conception and design: William L. Carroll, Stephen P. Hunger

Financial support: William L. Carroll, Cheryl L. Willman

Administrative support: Harland Sather, Stephen P. Hunger, William L. Carroll

Provision of study materials or patients: Stephen P. Hunger, Nita Seibel, Rob Pieters, Monique L. den Boer, Martin A. Horstmann, Cheryl L. Willman

Collection and assembly of data: Deepa Bhojwani, Huining Kang, Harland Sather, Wenjian Yang, Monique L. den Boer, Renee X. Menezes, Jeffrey W. Potter

Data analysis and interpretation: Deepa Bhojwani, Huining Kang, Monique L. den Boer, Renee X. Menezes, Wenjian Yang, Naomi P. Moskowitz, Dong-Joon Min, Richard Harvey

Manuscript writing: Deepa Bhojwani, Huining Kang, Monique L. den Boer, Elizabeth A. Raetz, Mary V. Relling, Stephen P. Hunger, William L. Carroll

Final approval of manuscript: Deepa Bhojwani, Huining Kang, Stephen P. Hunger, Monique L. den Boer, Rob Pieters, Mary V. Relling, Cheryl L. Willman, William L. Carroll

Supplementary Material

[Publisher's Note]


Methods for polymerase chain reaction.

Five hundred nanograms of patient RNA was converted into cDNA using Maloney murine leukemia virus–reverse transcriptase in a 20-μL reaction volume (Invitrogen Corp, Carlsbad, CA). This cDNA was diluted to a final volume of 50 μL by the addition of 30 μL 1× TE. For the qualitative polymerase chain reaction (PCR) analysis, 5 μL of this diluted cDNA (equivalent to 50 ng of starting RNA) was subjected to 40 cycles of amplification with the appropriate primers for each particular translocation in a model 9700 thermocycler (Applied Biosystems, Foster City, CA).

After amplification, the products of the t(1;19), t(12;21), and t(4;11) reactions were analyzed using capillary electrophoresis with the DNA 1,000 chips and a model 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA). Those samples showing products consistent with the predicted translocation sizes were verified by Southern blot analysis and hybridization with fluorescein-labeled oligonucleotide probes. Detection was done using a chemiluminescent detection kit (DAKO Corp, Carpinteria, CA). Quantitative PCR for the t(9;22) translocations was performed on the ABI model 7900 (Applied Biosystems) using primers to detect both the e1a2 and b2a2/b3a2 forms as well as an endogenous control gene, EEF2. The equivalent of 50 ng of starting RNA (5 μL of the diluted cDNA) was used for each of the three reactions. A fusion probe for the e1a2 product was used to quantify its levels. The b2a2 and b3a2 products were quantified together with a consensus probe in the b2 exon that is common to both forms. The EEF2 gene was used to show the quality and quantity of the cDNA as well as to normalize the samples. Although the detection of the t(9;22) products was performed quantitatively, the results for the purposes of assignment are scored as either positive or negative.

Statistical models for outcome prediction.

The logistic regression model can be written as

equation M1

where P is the probability of complete continuous remission in the selected population and LP is a linear combination of the predictors. An optimal subset of variables was selected to build the model using three variable selection methods (backward, forward, and stepwise) with a significance level of .05 for predictor to enter or to stay in the model. Three different best models, which included several genes and no clinical variables as predictors of outcome, were identified (Appendix Table A3).

In addition, LPV was the simple linear combination of the expression values of the probe sets that match the 47 probe sets in each of the three validation cohorts.

equation M2

where the weights were the t test statistics calculated in the Children's Oncology Group 1961 data set. Included in each of the three LPV models were 20 probe sets in the Pediatric Oncology Group data set, 31 probe sets in German Cooperative Study Group for Childhood ALL (acute lymphoblastic leukemia) or 47 probe sets in the Dutch Childhood Oncology Group data sets separately.

Table A1.

Patient Characteristics

PatientSexAge (months)WBC (×109/L)TranslocationDescription
O 1Male4659,300RER
O 2Female1348,700t(1;19)RER, CCR
O 3Male2570,000SER
O 4Male170732,000t(4;11)RER
O 5Female208314,600SER
O 6Female1612,100t(12;21)RER
O 7Female13299,500t(12;21)SER
O 8Male11866,700RER
O 9Female1971,500SER
O 10Female2765,200SER
O 11Female19844,580RER
O 12Female16161,700SER
O 13Male6165,800RER
O 14Male1872,950SER
O 15Male20268,000t(9;22)RER
O 16Male80144,000RER
O 17Female1864,100SER
O 18Female3192,800t(1;19)RER
O 19Female1719,800t(1;19)RER
O 20Male17761,800SER, relapse
O 21Female1292,500RER
O 22Female2094,000RER
O 23Male1338,000t(1;19)RER
O 24Female13430,700t(1;19)RER
O 25Male14415,600SER
O 26Female19110,500RER
O 27Male3484,700t(1;19)RER
O 28Female3997,000SER
O 29Female16191,000RER
O 30Male15850,000t(9;22)SER
O 31Female117300,000SER
O 32Female14279,000RER
O 33Male21368,300SER
O 34Female3964,900SER
O 35Male6688,100SER
O 36Female15034,400RER
O 37Male15454,000t(9;22)SER, relapse
O 38Female5076,400t(12;21)RER, CCR
O 39Female13650,300SER, CCR
O 40Female1256,900t(12;21)RER, CCR
O 41Male12610,700SER, relapse
O 42Female12987,600SER, relapse
O 43Male17745,900SER, CCR
O 44Female4190,800SER, CCR
O 45Male1674,400t(12;21)SER, CCR
O 46Male17693,500RER, relapse
O 47Male52165,000t(12;21)SER, relapse
O 48Male70121,500t(9;22)SER, relapse
O 49Male3986,700RER
O 50Male109253,100t(4;11)RER, relapse
O 51Male12868,100t(1;19)RER, relapse
O 52Male158164,000SER, relapse
O 53Male19328,000RER, relapse
O 54Female1851,800RER, relapse
O 55Female4164,500RER, CCR
O 56Male8365,000SER, CCR
O 57Female29178,000t(12;21)SER, CCR
O 58Male1399,440RER, CCR
O 59Male10158,700RER, relapse
O 60Male40106,000SER, CCR
O 61Male225262,800SER, relapse
O 62Female12512,600t(12;21)RER, CCR
O 63Male12189,300SER, relapse
O 64Male13571,670SER, relapse
O 65Female109672,000t(9;22)SER, CCR
O 66Male18982,400RER, relapse
O 67Male1996,000RER, CCR
O 68Female11615,8000t(4;11)SER, relapse
O 69Male19191,800RER, relapse
O 70Male18336,000RER, CCR
O 71Male188303,900t(9;22)SER, relapse
O 72Female126165,900RER
O 73Male1694,600RER, CCR
O 74Male106271,700SER, relapse
O 75Male1389,900RER
O 76Male8055,000SER
O 77Male21417,900RER
O 78Female1586,700SER
O 79Male15362,800SER, relapse
O 80Male7551,100SER, CCR
O 81Male19131,100RER, CCR
O 82Female19138,550CCR
O 83Male1794,250Relapse
O 84Male79325,900Relapse
O 85Male27209,000t(1;19)Relapse
O 86Male36113,000CCR
O 87Female146315,200Relapse
O 88Male14119,900t(1;19)CCR
O 89Male146158,000Relapse
O 90Female14828,400CCR
O 91Male17344,400CCR
O 92Male21398,600Relapse
O 93Male1381,800CCR
O 94Male12512,900RER, CCR
O 95Male1262,200CCR
O 96Male208108,000Relapse
O 97Male152170,400Relapse
O 98Female18960,200CCR
O 99Male178260,500t(4;11)Relapse

NOTE. Bold text indicates patients were common in both analyses (early response and outcome).

Abbreviations: RER, rapid early responder; SER, slow early responder; CCR, complete continuous remission.

Table A2.

Primer Sequences for RT-PCR

AssayPrimer Sequence
EEF2 (TaqMan)
BCR-ABL (TaqMan for both B2/3 and E1 forms)
MLL-AF4 (nested)

Abbreviations: RT-PCR, reverse transcriptase polymerase chain reaction; EEF2, eukaryotic translation elongation factor 2; TEL, translocation ETS leukemia; AML, acute myeloid leukemia; BCR, break point cluster region; ABL, Abelson murine leukemia viral oncogene; PBX, pre B-cell leukemia transcription factor; MLL, mixed linage leukemia; AF4, ALL1 fused gene from chromosome 1.

Table A3.

Logistic Regression Models

Model I (backward)LP1 = −66.128 + 4.0681 × (VBP1) − 2.1351 × (HSPA8) + 4.6574 × (MGRN1)
Model II (forward)LP2 = −1,170.3 + 47.9138 × (YWHAZ) + 32.0034 × (VBP1) − 16.0348 × (AGPS) + 4.8997 × (PTK2) + 45.9633 × (MGRN1)
Model III (stepwise)LP3 = −238.6 + 6.3478 × (YWHAZ) + 7.2844 × (VBP1) + 0.8561 × (PTK2) + 8.5699 × (MGRN1)

Table A4.

Validation for 47 Probe Sets (1)

U133 IDHigh →U95 IDValidation With POG Data
Validation With COALL
Validation With DCOG
SymbolGene Description
Odds Ratio of CCRHigh →PP Adjusting for SubtypeHazard RatioHigh →PP Adjusting for SubtypeHazard RatioHigh →PP Adjusting for Subtype
35666_atCCR0.8270CCR.1652.22440.5347CCR.0083.0150SEMA3FSema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3F
227877_atFailNo match3.4569Fail.0003.0079Similar to annexin II receptor (LOC389289), mRNA
227131_atCCRNo match0.3778CCR.0073.1170MAP3K3Mitogen-activated protein kinase kinase kinase 3
205401_atFail39225_at0.7709Fail.0308.04441.4191Fail.0227.24171.7024Fail.0114.0302AGPSAlkylglycerone phosphate synthase
208687_x_atFail1179_at0.8418Fail.1050.32811.5373Fail.0246.28261.7072Fail.1138.1809HSPA8Heat shock 70 kDa protein 8
212229_s_atCCRNo match0.1971CCR.0005.0023FBXO21F-box only protein 21
212576_atCCR32235_at1.4168CCR.0073.05260.5066CCR.0355.01570.1713CCR.0061.1709MGRN1Mahogunin, ring finger 1
225446_atCCRNo match0.7464CCR.2590.3139C21orf107Chromosome 21 open reading frame 107
224793_s_atCCRNo match0.5579CCR.0332.0333TGFBR1Transforming growth factor, β receptor I (activin A receptor type II-like kinase, 53 kDa)
221840_atCCRNo match0.5599CCR.0001.14090.4861CCR.0073.0796PTPREProtein tyrosine phosphatase, receptor type, E
203514_atCCRNo match0.2618CCR< .0001.01030.5114CCR.0498.4390MAP3K3Mitogen-activated protein kinase kinase kinase 3
1559018_atCCRNo match0.7534CCR.1851.2120PTPREProtein tyrosine phosphatase, receptor type, E
217499_x_atFailNo match1.6161Fail.0358.35882.9133Fail.0661.4768OR7E47POlfactory receptor, family 7, subfamily E, member 47 pseudogene
224187_x_atFail1180_g_at0.7093Fail.0078.08462.0577Fail.0639.0990HSPA8Heat shock 70 kDa protein 8
221891_x_atFail33820_g_at0.8539Fail.1247.40821.6161Fail.0399.05331.6902Fail.1898.2258HSPA8Heat shock 70 kDa protein 8
201642_atCCR41140_at0.8839Fail.8166.83230.8781CCR.2963.49601.0325Fail.5284.6506IFNGR2Interferon γ receptor 2 (interferon γ transducer 1
218418_s_atCCRNo match0.6570CCR.0065.18750.4928CCR.0008.0135ANKRD25Ankyrin repeat domain 25
242305_atFailNo match4.5128Fail.0103.0057CDNA FLJ42757 fis, clone BRAWH3001712
216035_x_atCCRNo match0.5543CCR.0000.02220.5389CCR.0099.0514TCF7L2Transcription factor 7-like 2 (T-cell specific, HMG-box)
1556321_a_atCCRNo match0.2974CCR.0037.0127mRNA full-length insert cDNA clone EUROIMAGE 283668
235014_atFailNo match2.6793Fail.0476.0921LOC147727Hypothetical protein LOC147727
208820_atCCR36117_at1.1822CCR.1137.28280.8187CCR.0481.13070.6916CCR.0725.4788PTK2PTK2 protein tyrosine kinase 2
212231_atCCR32169_at1.3503CCR.0194.11610.6376CCR.0545.30780.3293CCR.0068.0121FBXO21F-box only protein 21
229618_atCCRNo match0.8772CCR.3791.2522SNX16Sorting nexin 16
209033_s_atCCR1512_at1.0524CCR.3541.27220.7866CCR.2308.45790.3686CCR.0303.0066DYRK1ADual-specificity tyrosine-(Y)-phosphorylation regulated kinase 1A
200641_s_atCCR34642_at1.0255CCR.4269.30710.9139CCR.3670.39791.1687Fail.6451.6822YWHAZTyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta polypeptide
202657_s_atFail37312_at1.0611CCR.6674.62121.3910Fail.1106.26723.4313Fail.0050.1299SERTAD2SERTA domain containing 2
201099_atCCRNo match0.9324CCR.3890.37730.3949CCR.0241.0022USP9XUbiquitin-specific protease 9, X-linked (fat facets-like, Drosophila)
201542_atCCRNo match0.4317CCR.0086.17130.2702CCR.0050.0048SARA1SAR1a gene homolog 1 (S. cerevisiae)
227068_atCCRNo match0.5209CCR.0565.0508PGK1Phosphoglycerate kinase 1
213944_x_atCCR41476_at0.8669Fail.8518.72820.9231CCR.4079.31520.4148CCR.0360.4760GNA11Guanine nucleotide binding protein (G protein), alpha 11 (Gq class)
201472_atCCR171_at1.1376CCR.1773.16340.7118CCR.0734.49870.6076CCR.1698.1764VBP1von Hippel-Lindau binding protein
202806_atCCR37981_at1.8523CCR.0001.00070.7634CCR.0785.32800.4670CCR.0244.2112DBN1drebrin 1
221918_atCCRNo match0.5827CCR.0124.14280.4768CCR.0492.0195PCTK2PCTAIRE protein kinase 2
214585_s_atFail32658_at0.9215Fail.2743.27931.3231Fail.0597.09294.3420Fail.0448.0391VPS52Vacuolar protein sorting 52 (yeast)
219078_atFailNo match1.0618Fail.3451.37392.9254Fail.0090.2201GPATC2G patch domain containing 2
219133_atFailNo match2.3405Fail.0612.0998FLJ20604Hypothetical protein FLJ20604
1558111_atFailNo match2.0333Fail.0287.1407MBNL1Muscleblind-like (Drosophila)
221773_atCCRNo match0.7483CCR.0001.44280.5803CCR.0046.3046ELK3ELK3, ETS-domain protein (SRF accessory protein 2)
1558732_atCCRNo match0.3631CCR.0015.0369gb:AK074900.1/DB_XREF = gi:22760646/TID = Hs2.382077.1/CNT = 11/FEA = mRNA/...
212441_atCCR37748_at1.4004CCR.0118.03120.5488CCR.0010.12960.3673CCR.0171.0250KIAA0232KIAA0232 gene product
226775_atCCRNo match0.3359CCR.0221.0963e(y)2e(y)2 protein
208498_s_atFail36680_at0.9858Fail.4582.72871.0305Fail.4201.09571.6438Fail.0508.3211AMY2BAmylase, α 2B; pancreatic
201121_s_atCCR38802_at1.1054CCR.2349.12220.3499CCR.0007.03770.3753CCR.0256.0257PGRMC1Progesterone receptor membrane component 1
202984_s_atCCRNo match0.8270CCR.1357.31220.3916CCR.0152.0197BAG5BCL2-associated athanogene 5
210338_s_atFailNo match1.4333Fail.0590.16341.6945Fail.1165.1925HSPA8Heat shock 70 kDa protein 8
206548_atCCRNo match0.7866CCR.0156.34730.4007CCR.0003.0025FLJ23556Hypothetical protein FLJ23556

NOTE. P values are one sided and uncorrected for multiple testing.

Abbreviations: ID, identification; POG, Pediatric Oncology Group; COALL, German Cooperative Study Group for Childhood ALL; ALL, acute lymphoblastic leukemia; DCOG, Dutch Childhood Oncology Group; CCR, complete continuous remission; Fail, relapse.

Table A5.

Validation of the Outcome Signature on DCOG Data Set Using Logistic Regression

Multivariate Adjusting for Age and WBC
Multivariate Adjusting for ALL Subtype
Odds RatioPOdds RatioPOdds ratioP
I (LP1)1.233.0111.175.0590.653.874
II (LP2)1.016.0151.011.0790.744.898
III (LP3)1.078.0461.054.1390.781.771

Abbreviations: DCOG, Dutch Childhood Oncology Group; ALL, acute lymphoblastic leukemia.


Supported by Grants No. U01 CA114762, CA21765 (W.Y. and M.V.R.), and CA51001 (W.Y. and M.V.R.) from the National Cancer Institute; Director's Challenge Grant No. U01 CA88361 (C.L.W., W.L.C.); by the Penelope London Foundation; the Friedman Fund for Childhood Leukemia; the Walter Family Pediatric Leukemia Fund; the Garrett B. Smith Foundation (N.P.M.); the Pediatric Cancer Foundation; the Dutch Cancer Society and the Pediatric Oncology Foundation of Rotterdam (M.L.D., R.X.M., and R.P.); the Center of Medical Systems Biology, established by the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research (R.X.M.); Grants No. U01 GM61393 and U01 GM61374 from the National Institutes of Health National Institute of General Medical Sciences Pharmacogenetics Research Network and Database (W.Y. and M.V.R.); and the American-Lebanese-Syrian Associated Charities (W.Y. and M.V.R.).

R.P. reports on behalf of the Dutch Childhood Oncology Group, The Hague, the Netherlands; M.A.H. reports on behalf of the German Cooperative Study Group for Childhood ALL, Hamburg, Germany.

Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.


1. Pui CH, Evans WE: Treatment of acute lymphoblastic leukemia. N Engl J Med 354:166-178, 2006. [PubMed]
2. Schultz KR, Pullen DJ, Sather HN, et al: Risk and response-based classification of childhood B-precursor acute lymphoblastic leukemia: A combined analysis of prognostic markers from the Pediatric Oncology Group (POG) and Children's Cancer Group (CCG). Blood 109:926-935, 2007. [PubMed]
3. Gaynon PS: Childhood acute lymphoblastic leukaemia and relapse. Br J Haematol 131:579-587, 2005. [PubMed]
4. Alizadeh AA, Eisen MB, Davis RE, et al: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403:503-511, 2000. [PubMed]
5. Hayes DN, Monti S, Parmigiani G, et al: Gene expression profiling reveals reproducible human lung adenocarcinoma subtypes in multiple independent patient cohorts. J Clin Oncol 24:5079-5090, 2006. [PubMed]
6. Golub TR, Slonim DK, Tamayo P, et al: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531-537, 1999. [PubMed]
7. Yeoh EJ, Ross ME, Shurtleff SA, et al: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1:133-143, 2002. [PubMed]
8. Siebel NL, Steinherz PG, Sather HN, et al: Early postinduction intensification therapy improves surivival for children and adolescents with high-risk acute lymphoblastic leukemia: A report from the Children's Oncology Group. Blood 111:2548-2555, 2008. [PubMed]
9. Helman P, Veroff R, Atlas SR, et al: A Bayesian network classification methodology for gene expression data. J Comput Biol 11:581-615, 2004. [PubMed]
10. Tibshirani R, Hastie T, Narasimhan B, et al: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99:6567-6572, 2002. [PubMed]
11. Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Statist Soc B 57:289-300, 1995
12. Hosmer D, Lemeshow S: Applied Logistic Regression (ed 2). Hoboken, NJ, John Wiley and Sons Inc, 2000
13. Martin SB, Mosquera-Caro MP, Potter JW, et al: Gene expression overlap affects karyotype prediction in pediatric acute lymphoblastic leukemia. Leukemia 21:1341-1344, 2007. [PubMed]
14. Holleman A, Cheok MH, den Boer ML, et al: Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment. N Engl J Med 351:533-542, 2004. [PubMed]
15. Kamps WA, Bokkerink JP, Hakvoort-Cammel FG, et al: BFM-oriented treatment for children with acute lymphoblastic leukemia without cranial irradiation and treatment reduction for standard risk patients: Results of DCLSG protocol ALL-8 (1991-1996). Leukemia 16:1099-1111, 2002. [PubMed]
15a. Ashburner M, Ball CA, Blake JA, et al: Gene ontology: Tool for the unification of biology—The Gene Ontology Consortium. Nature Genetics 25:25-29, 2000. [PMC free article] [PubMed]
16. Newman JC, Weiner AM: L2L: A simple tool for discovering the hidden significance in microarray expression data. Genome Biol 6:R81, 2005. [PMC free article] [PubMed]
17. Nachman JB, Sather HN, Sensel MG, et al: Augmented post-induction therapy for children with high-risk acute lymphoblastic leukemia and a slow response to initial therapy. N Engl J Med 338:1663-1671, 1998. [PubMed]
18. Flotho C, Coustan-Smith E, Pei D, et al: A set of genes that regulate cell proliferation predicts treatment outcome in childhood acute lymphoblastic leukemia. Blood 110:1271-1277, 2007. [PubMed]
19. Flotho C, Coustan-Smith E, Pei D, et al: Genes contributing to minimal residual disease in childhood acute lymphoblastic leukemia: Prognostic significance of CASP8AP2. Blood 108:1050-1057, 2006. [PubMed]
20. Brisco MJ, Sykes PJ, Dolman G, et al: Early resistance to therapy during induction in childhood acute lymphoblastic leukemia. Cancer Res 60:5092-5096, 2000. [PubMed]
21. Borowitz MJ, Pullen DJ, Shuster JJ, et al: Minimal residual disease detection in childhood precursor-B-cell acute lymphoblastic leukemia: Relation to other risk factors—A Children's Oncology Group study. Leukemia 17:1566-1572, 2003. [PubMed]
22. Holleman A, den Boer ML, Cheok MH, et al: Expression of the outcome predictor in acute leukemia 1 (OPAL1) gene is not an independent prognostic factor in patients treated according to COALL or St Jude protocols. Blood 108:1984-1990, 2006. [PubMed]
23. Lugthart S, Cheok MH, den Boer ML, et al: Identification of genes associated with chemotherapy cross resistance and treatment response in childhood acute lymphoblastic leukemia. Cancer Cell 7:375-386, 2005. [PubMed]
24. Fan C, Oh DS, Wessels L, et al: Concordance among gene-expression-based predictors for breast cancer. N Engl J Med 355:560-569, 2006. [PubMed]
25. Staal FJ, Cario G, Cazzaniga G, et al: Consensus guidelines for microarray gene expression analyses in leukemia from three European leukemia networks. Leukemia 20:1385-1392, 2006. [PubMed]

Articles from Journal of Clinical Oncology are provided here courtesy of American Society of Clinical Oncology