Microarray analysis of global gene expression has led to rapid advances in our understanding of various physiological and pathological processes. Although many hundreds of studies have been done, doubts have been raised about the reproducibility and applicability of this data 
. Inter-study variability can be attributed to differing probes on the arrays, different protocols for RNA extraction, labeling and hybridization, and differences in the quality of cells. In spite of these factors, a number of studies have also demonstrated reproducibility of microarray studies performed at different platforms and laboratories, though most used the same source of RNA for these analyses 
. The MicroArray Quality Control consortium (MAQC) was formed to address these questions and recently reported that reproducibility can be enhanced by better matching of microarray probes between platforms 
. They concluded that matching probe-sets within the same exons and using similar experimental protocols can lead to more reproducible results when performed on major commercial microarray platforms. Our results take these findings a step further and demonstrate that GE studies done using different platforms and distinct sources of material have the power to discriminate between biologically distinct tissues and thus can also be used to analyze various scientific questions. Earlier attempts to address study specific biases have used statistical algorithms including ANOVA based correction models 
. We did not use these algorithms as we found adequate discrimination between biologically distinct tissues, demonstrating that the degree of differential gene expression is so large that it is found even in presence of possible study-specific biases. It is possible that some of the more subtle results seen in our analysis, however, may prove artificial once these biases have been removed by appropriate methods.
Furthermore, this meta-analysis can be accomplished simply by using UniGene and RefSeq identifiers as common variables between array platforms, though UniGene is shown to be slightly better at achieving this discrimination in our dataset. This difference between UniGene and RefSeq results, albeit small, is likely due to the different methods of identifying and assigning transcripts used in the process, and has been observed in prior studies also 
. Even though we did observe variability due to different laboratory protocols as seen by previous studies, a superior correlation between tissues with similar sources of cells was able to surpass this limitation and make the meta-analysis scientifically useful.
Our study demonstrated that results obtained through this approach can be reconciled with the biology of hematopoietic cells and malignancies thereof. For example, samples from acute myeloid leukemia and myelodysplasia were found to be transcriptionally closer to normal hematopoietic cells than non-hematopoietic cells, even though these studies are done in many different laboratories. MDS is a preleukemic disorder of varying grades of pathology and can have an indolent course in most patients 
. The fact that MDS samples clustered with normal hematopoietic samples in some cases shows that our analysis can interpret biological relationships even between studies performed by different experimental protocols and laboratories.
After demonstrating that our approach can be used to biologically characterize sources of cells, we attempted to use this database to discover gene signatures characteristic of hematopoietic progenitor and stem cells. Due to the heterogeneity of our source dataset, we imposed very stringent criteria to discover genes characteristic of hematopoietic progenitors. Out of the 349 genes that were differentially expressed in normal progenitors, 124 are differentially expressed in diseased hematopoietic cells, demonstrating that hematologic malignancies result in disruption of important functional genes. Our search strategy yielded several genes that were consistently enriched in normal hematopoietic GE datasets and were found to be involved in cell cycle, growth, development and hematopoiesis by functional pathway analysis. Recent studies have supported similar comparative approaches for more accurate and valid gene target discovery 
. Two recent seminal studies searched for gene signatures of stem cells by comparing genes enriched in hematopoietic, neural and embryonic stem cells and arrived at a total of 283 and 230 common ‘stemness’ genes respectively 
. Even though the experimental techniques and cell types in these two papers were similar, an initial comparative analysis showed that only 7 ‘stemness’ genes were common between these two studies. Comparison to a subsequent third analysis 
showed even less overlap, with only one gene being consistently enriched between these three independent similar studies. Repeat analysis done using different statistical methods did lead to more gene overlap, but the final conclusion was that gene array studies of stem cells are influenced by cell purity and can be contaminated by a high level of non-specific observations in the data. Consequently, the authors determined that commonly expressed genes among different studies may be better representatives of functionally important stemness genes. Thus, meta-analytical approaches may be a way to separate functionally important information from experimental noise. As the genes discovered by our analysis are common in an extremely variable dataset, they may have a high chance of being characteristic of human HSCs. Most importantly, alpha-6 integrin, the one gene that was found be enriched in all three murine stem cell studies, is similar to alpha-4 integrin that was found to be enriched in our human dataset. Both of these integrins are known to be expressed on the surface of HSCs and are implicated in cell migration and homing to the bone marrow. The functional similarities between these two integrins and the concurrence of our findings with three landmark stemness gene studies published in the literature validate our analytical approach.
Our analysis also yielded a set of genes not previously implicated in hematopoiesis. Some of these genes have interesting functions and can be potential regulators of HSC function. SMARCE (SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily e/BAF57) is a key member of the mammalian SWI/SNF chromatin remodeling complex that is involved in transcriptional regulation 
. SMARCE has been shown to mediate the interaction between the chromatin remodeling complex and transcription factors and thus could be partly responsible for the unique chromatin associated with stem cells 
. Lyn kinase is a member of the src family of kinases and has been implicated in granulopoiesis and erythropoiesis and needs further exploration as a stem cell marker 
. Septin 6 is a member of a class of proteins involved in cell division, membrane trafficking and cytoskeletal organization. The roles of septins in hematopoietic stem cells remain unexplored 
. Amyloid beta precursor protein is a cell surface protein with signal-transducing properties, and it is thought to play a role in the pathogenesis of Alzheimer's disease 
. This protein can activate NEDD8, a ubiquitin-like protein required for cell cycle progression through the S/M checkpoint and thus can be potentially involved in cell cycle control of hematopoietic stem cells. The protein Dp-2 (E2F dimerization partner 2) belongs to a family of transcription factors that play an essential role in regulating cell cycle progression 
. These transcription factors regulate the expression of numerous critical genes (e.g. cyclin E, CDC2, cyclin A, B-Myb, E2F1, and p107) involved in cell cycle progression as well as several enzymes (DNA polymerase α, thymidine kinase, and dihydrofolate reductase) required for DNA replication 
. Thus Dp-2 could certainly be involved in stem cell regulation. In summary, our analytical approach provides a list of interesting genes for further scientific and functional validation. Additionally, this dataset can be used as an online resource for stem cell and hematology researchers as a control database for comparisons with disease state GE profiles done in their laboratories.