In order to identify the differentially expressed genes in large granular lymphocytic (LGL) leukemia, we performed microarray analysis using the UniGEM-V microarray from IncyteGenomics and the HU6800 oligonucleotide array from Affymetrix. In the course of our analysis, we discovered several problems that we feel could occur in other studies that might lead to false conclusions.
Approximately 80 up-regulated genes and 12 down-regulated genes were identified by cDNA microarray analysis in leukemic LGL cells. Since microarray technology was a new tool at that time, we decided to verify the sequences of all the genes that were differentially expressed. To that end, we purchased approximately 20 clones representing the differentially expressed genes and verified the sequences. We found that only approximately 70% of the genes spotted on the microarray matched the correct sequence of the clones. Other groups reported similar observations. For example, IMAGE mouse cDNA clones (approximately 1200) were purchased from Research Genetics (Huntsville, Alabama) and sequences were verified by Halgren et al
]. This group found that only 62% were definitely identified as a pure sample of the correct clones. In another study, PCR amplification products (previously sequence-verified cDNA clones) were re-sequenced and only 79% of the clones matched the original database [12
]. In a different study, it was estimated that only 80% of the genes in a set of microarray experiments were correctly identified [5
]. Therefore, we advise that when preparing cDNA microarrays (commercial or homemade), it is necessary to sequence verify each clone at the final stage before printing the microarray. If mistakes are made at this stage, it is not possible to correct them later by using the most sophisticated analytical tools.
We used cDNA microarray analysis to compare the gene expression profile of leukemic LGL cells obtained from a patient versus the expression profile of PBMC obtained from a normal healthy individual as a control. We decided to verify the microarray results using samples from more patients by employing the use of other methods such as PCR, Northern blot and RNase protection assay. To our surprise, none of the three down-regulated genes studied exhibited differential expression in Northern blots when the cDNA fragments of these genes were used as probes. In the up-regulated genes, only 47 % proved to support the results from the microarray data. The rest either displayed no signal, were not detectable in any sample or failed to reveal any differential expression whatsoever. Although some genes such as PAC-1 and A20 showed differential expression in LGL leukemia patients, no product amplification was obtained using RT-PCR with gene-specific primers.
By microarray analysis, it is very difficult to distinguish between two similar genes. The best example in our case is when granzyme B
and granzyme H
are compared. These two genes share approximately 80% similarity at the DNA level but have different enzymatic activities [13
]. Using either one of the genes as a probe, both cDNA microarray and northern blot analysis indicated over-expression of both genes indiscriminately (Fig. ). However, using gene-specific probes in an RNase protection assay, we were able to distinctly identify the over-expression of both granzyme B
in leukemic LGL cells (Fig. and ). In normal PBMC only trace amounts of both genes were identified, but after activation by PHA and IL2 only granzyme B
was up-regulated. It is very difficult to get this information by microarray analysis alone. Therefore, caution in presenting microarray data without verification and confirmation is advised.
When the results from two different microarray technologies (cDNA and oligonucleotide arrays) were compared, the differential expression in some of the genes appeared to agree in both cases but a large variation in expression profiles between the two microarrays was clearly evident. Previously, such systematic differences in the two technologies were reported [6
]. For example, perforin
showed a 103-fold change in the Affymetrix array, whereas the cDNA microarray showed only a balanced differential expression of 3.8-fold. Northern blot results indicate that the genes were over-expressed, but the actual value is in between the values from the two microarrays. This problem may be due to an inaccurate fold change calculation due to the inclusion of mismatch values in the formula. We observed that many over-expressed genes were not properly identified at times. This may be the result of the introduction of mismatch values in the Affymetrix system. For example, genes for human autoantigen
and human carboxyl ester lipase-like protein
would be considered up-regulated in the microarray (according to PM match hybridization) if the MM hybridization values were ignored in the fold change calculation.
DNA microarray anlysis can be a powerful technique to identify differentially expressed genes but differentiating between splice variants can be problematic. For example, although the differential expression of the several genes such as PAC-1 and A20 were confirmed by northern blot analysis, we were unable to see any expression of protein corresponding to these genes by Western blot analysis. We were also unable to amplify those genes using gene-specific primers by RT-PCR. After screening the LGL library, we obtained several full-length genes that were different from both the 5' and 3' ends of PAC1. Similarly, we screened an LGL leukemia library and obtained several 1.5 kb cDNA fragments using the A20 cDNA as a probe. The deduced amino acid sequences of these genes revealed different proteins.
We found an up-regulation of NKG2C
with a balanced differential expression of 5.8 in cDNA microarray (Fig. ). When Northern Blot analysis was performed using NKG2 C
cDNA as a probe, we identified multiple transcripts. Screening the LGL leukemia library resulted in the identification of several other members of the NKG2
family such as NKG2 A, D, E
, and F
]. Therefore, it can be very difficult to distinguish different forms of genes if they are similar in certain sequence regions.