Acute lymphoblastic leukemia (ALL) is the most common pediatric malignancy and has been the poster-child for improved therapeutics in cancer, with life time disease-free survival (LTDFS) rates improving from <10% in 1970 to >80% today. There are numerous known genetic prognostic variables in ALL, which include T cell ALL, the hyperdiploid karyotype and the translocations: t(12;21)[TEL-AML1], t(4;11)[MLL-AF4], t(9;22)[BCR-ABL], and t(1;19)[E2A-PBX]. ALL has been studied at the molecular level through expression profiling resulting in un-validated expression correlates of these prognostic indices. To date, the great wealth of expression data, which has been generated in disparate institutions, representing an extremely large cohort of samples has not been combined to validate any of these analyses. The majority of this data has been generated on the Affymetrix platform, potentially making data integration and validation on independent sample sets a possibility. Unfortunately, because the array platform has been evolving over the past several years the arrays themselves have different probe sets, making direct comparisons difficult.
To test the comparability between different array platforms, we have accumulated all Affymetrix ALL array data that is available in the public domain, as well as two sets of cDNA array data. In addition, we have supplemented this data pool by profiling additional diagnostic pediatric ALL samples in our lab. Lists of genes that are differentially expressed in the six major subclasses of ALL have previously been reported in the literature as possible predictors of the subclass.
We validated the predictability of these gene lists on all of the independent datasets accumulated from various labs and generated on various array platforms, by blindly distinguishing the prognostic genetic variables of ALL. Cross-generation array validation was used successfully with high sensitivity and high specificity of gene predictors for prognostic variables. We have also been able to validate the gene predictors with high accuracy using an independent dataset generated on cDNA arrays.
Interarray comparisons such as this one will further enhance the ability to integrate data from several generations of microarray experiments and will help to break down barriers to the assimilation of existing datasets into a comprehensive data pool.