The Affymetrix exon arrays offer a significant increase in content and a greater utility than their 3' expression arrays. The distribution of probes sets across each exon for some 28,800 genes allows the mapping of splice variants. They feature a new design with no mismatch probes, and a random primed protocol to generate sense DNA targets along the entire length of the transcript. These changes raised uncertainty with regard array performance, but transcript expression has been shown comparable with the 3' platforms with close agreement and similar sensitivity [30
]. The minimal price difference between the platforms makes the exon array an attractive alternative. This pilot study was designed to determine the practicality of implementing these arrays in our laboratory.
Every component of the exon array methodology, from reagents and protocols to quality control and analysis software is different for the exon arrays compared to the previous 3' arrays. Both labeling methodologies take approximately 2 1/2 days with 2 overnight steps (an in vitro
transcription step followed by hybridization on the second night.) However, the Exon protocol has more steps and requires considerably more hands-on time (we estimate double), especially on the second day. A ribosomal RNA reduction step is required prior to RNA labeling to reduce their impact on amplification and labeling because of the random prime labeling strategy. This necessitates the purchase of special equipment for the magnetic bead separation. A scanner autoloader is also required as scan times increased from 12 to 30 minutes per GeneChip with the change in platform. In our laboratory we were able to do manual labeling of 8 samples per run, whereas for the 3' arrays we could do 24. The downstream processing of data is also a challenge. Data sets are much larger and the analysis is performed at several different levels: exon (for alternate splicing) or transcript (for expression levels) using either the core, full or extended probe sets [13
The quality control checks for monitoring the labeling protocol and the hybridized GeneChip were extremely useful. The former could save running poor samples on an array, allowing the labeling to be repeated. The latter prevented the inclusion of technical outliers in the statistical analysis. All our samples easily met the QC criteria. The only disparate data were differences in % rRNA reduction. D0 samples had the lowest average reduction (56%), peaking on D1 (69%) and returning to baseline with D3 and D5 at 67%, and D7 at 61%. These differences were most likely due to increased protein synthetic levels expected for cell proliferation.
The corroboration of both transcript expression and alternate exon usage data using a different analytical platform adds weight to the value of exon arrays as an analytical tool. Our validation rates by real-time PCR were 86% for expression data, and 71% for alternate exon usage results. Validation rates for alternate exon usage in other studies using Affymetrix exon arrays range from 21 to 84% [33
], some of the fluctuation can probably be attributed to differences in data filtration and no visual inspection of the results, emphasizing the importance of best practice methodologies. The alternate exon usage in many of the transcript clusters identified in this study showed greater complexity than a single exon inclusion or exclusion event, illustrating that more than one alternative splice isoform can be maintained concurrently in the mRNA pool. For this data set, it is not possible to dissect if this reflects changes in the ratios of isoforms associated with physiological variation or reflect changes in the cell sub-populations. It is estimated that more than 75% of genes produce alternative transcripts [36
], contributing to functional diversity in the genome. Therefore there is little question that this is an important component of understanding the complexity of the mammalian transcriptome.
Implementation of the Affymetrix exon arrays requires a considerable input of time but our results show that the benefits are worth the effort. Probably the most challenging area currently, is the downstream analysis of data, mainly because of the increased data set size.
Much work has been done to characterize the genome-wide transcriptional program of lymphocyte activation and proliferation in a wide range of systems, including cell culture [38
] whole blood [39
], PBMCs [40
] and purified cell populations [38
]. This is a critical step toward understanding the biologic processes involved. The interpretation of expression patterns from mixed cell populations is complicated by variation in relative proportions of the cell subsets. We cannot distinguish between genes that undergo modest changes in a large percentage of cells from those that undergo large changes in a small subpopulation of cells. However, the strength of using mixed cell populations is in considering the interactions of these populations. Regulatory functions may be provided by direct cell-to-cell contact or via cytokine secretion from different cells. To understand the regulatory networks underpinning cellular dynamics both purified and mixed-cell populations need to be studied. The same confounders apply to exon array profiling along with the possible differential compartmentalization of nuclear and cytoplasmic isoforms [43
The majority of transcripts that showed differential expression over time did not show a change in alternate exon usage (87.5%), indicating that distinct networks of regulation are operating. Interestingly, transcripts involving constitutive and alternative splicing regulators (the SR family proteins [40
] such as SFRS2, SFRS7 and SFRS10) were significantly enriched for alternate exon usage indicating auto-regulatory organization at the level of transcript splicing. While most of the alternate exon usage events in this study have previously been described, several new splicing patterns were identified, another benefit to implementing the exon array platform.