|Home | About | Journals | Submit | Contact Us | Français|
Understanding the biologic significance of alternative splicing has been impeded by the difficulty in systematically identifying and validating transcript isoforms. Current exon array workflows suggest several different filtration steps to reduce the number of tests and increase the detection of alternative splicing events. In this study, we examine the effects of the suggested pre-analysis filtration by detection above background P value or signal intensity. This is followed post-analytically by restriction of exon expression to a fivefold change between groups, limiting the analysis to known alternative splicing events, or using the intersection of the results from different algorithms. Combinations of the filters are also examined. We find that none of the filtering methods reduces the number of technical false-positive calls identified by visual inspection. These include edge effects, nonresponsive probe sets, and inclusion of intronic and untranslated region probe sets into transcript annotations. Modules for filtering the exon microarray data on the basis of annotation features are needed. We propose new approaches to data filtration that would reduce the number of technical false-positives and therefore, impact the time spent performing visual inspection of the exon arrays.
Alternative splicing is quite common, having been described in about 70% of all human multi-exon genes.1,2 It is a key step in the formation of proteomic diversity, and understanding this diversity is essential to the discovery of disease biomarkers and therapeutic targets. A deeper knowledge should also aid in the understanding of many fundamental biological processes. Technologically, the new generation of high-density, exon-centric microarrays allows for an increasingly accurate and complete assessment of global gene expression. This facilitates the study of known splicing events as well as the discovery of novel splice variants.3–5 Affymetrix Exon 1.0 ST arrays target annotated and predicted exons in the human, mouse, or rat genomes and are therefore, very different from the previous generation of 3′ in vitro transcription (IVT) arrays. Design changes include the removal of mismatched probes, replacing them with genomic and antigenomic background probes. The background signal for exon arrays is determined by the median response of matched background probes with similar guanine and cytosine (GC) content to the probe in question. These are used in “detection above background” (DABG) calculation, replacing the present/marginal/absent calls of the 3′ arrays. Another change involves target generation. Exon arrays methodology uses T7-linked, random hexamers for cDNA synthesis, generating a DNA target, which on array hybridization, forms a DNA/DNA duplex, quite different compared with the RNA/DNA heteroduplex from the oligo-dT-linked T7 labeling of the IVT arrays. The changes within this new format raised uncertainty with regard to the performance of these arrays. This has been dispelled by the publication of several papers, showing that the gene expression levels of the exon array perform comparably with the 3′-targeted platforms, giving close agreement and the same sensitivity.6–8 The design changes impact several of the pre-analytical methods, and considerable effort is being made to improve and understand all aspects of exon array data processing. For the pre-analysis steps, different modeling algorithms have been examined for estimating probe background intensities using probe content.9,10 Also, the summarization of the raw exon array data is generally performed by one of two algorithms, Robust Multichip Average (RMA)11 or Probe Logarithmic Intensity Error.12 It has been shown that this does not affect the efficacy of statistical tools in detecting alternative splicing events.13 The exon array data analysis is performed along two parallel tracks, one to obtain gene-level data and the other for exon-level data. Nothing changes with regard to the gene-level analysis once the data have been summarized. Exon-level data require specialized algorithms to look for possible alternative splicing events, and several publications have reported on this,14–17 and analytical workflows have been proposed.13,18,19 These all include various levels of data filtration (in an attempt to moderate multiple testing errors) and require the visual inspection of probe-set intensities mapped onto transcript information as a way of assessing the true discovery rate.13,18,20 Although several suggestions have been made as to what filtration steps can be implemented, no study as yet has looked at the impact of combining the different filtration steps, and none has reported on the results of visual inspections. We use an experimental data set generated in our laboratory and document the effects of different filtration steps. We categorize the transcript clusters (TCs) identified as statistically significant into four categories (from false-positive to true-positives) using visual assessment. From this, we are able to make several recommendations that hopefully, bioinformaticians can integrate into algorithms and workflows being developed for this platform.
Blood samples from three volunteers drawn from the Centers for Disease Control (CDC) volunteer donor pool were obtained after informed consent. All participants were healthy, female, Caucasian, 46–56 years old, and self-reported as having no allergies or hay fever.
Whole blood was collected from each donor in cell preparation tubes (sodium citrate, Becton Dickinson, Franklin Lakes, NJ, USA) and the peripheral blood lymphocytes purified according to the manufacturer's instructions. Lymphocytes were counted, assessed for viability with trypan blue (>98% in all cases), frozen, and stored in liquid nitrogen. For the stimulation analysis, lymphocytes were thawed at 37°C and aliquoted into round-bottom tubes at 1.425 × 107 cells in 14.25 mL RPMI-1640 medium containing 20% FBS. Cells were stimulated by the addition of PHA (1 μg/mL), IL-2 (10 U/mL), and ionomycin (1 μM). The unstimulated tubes had an equal volume of media added to the tube. Cells were harvested and assessed at Day 0 (D0; prior to stimulation) and 3 days after stimulation (D3). Cell counts were performed at each time-point using trypan blue, and 5 × 106 cells were resuspended in 1 mL TRIzol reagent.
Total RNA was extracted from the lymphocytes in TRIzol reagent, according to the manufacturer's instructions. RNA quality and quantity were assessed using the Agilent 2100 bioanalyzer, and all samples had an RNA integrity number >7.5 and yield >5 μg. Total RNA (1 μg) was subjected to ribosomal RNA removal using the RiboMinus human/mouse transcriptome isolation kit (Invitrogen, Carlsbad, CA, USA) and then cDNA synthesized using the whole-transcript sense target labeling assay (Affymetrix®, Santa Clara, CA, USA), including the labeling controls from the GeneChip® eukaryotic poly-A RNA control kit. All methodologies followed those suggested by the manufacturer. The fragmented antisense biotin-labeled cDNA had hybridization buffer, eukaryotic hybridization controls (to confirm the sensitivity of hybridization), and oligo B2 controls (used to orient and grid the array), added just prior to hybridization to the Affymetrix® human exon 1.0 ST array. Hybridization was at 45°C for 17 h, as described in the Affymetrix® Users Manual.21 Following hybridization, the chips were washed and stained with a PE-strepavidin conjugate using the GeneChip® fluidics station with the FS450__0001 protocol. The chips were scanned using the Affymetrix® GeneChip® scanner 3000 7G, and the Affymetrix® GeneChip® operating software was used for the management, sharing, and initial processing of the expression data. All array data files have been deposited in Array Express under Accession Number E-MEXP-884.
Array quality control was performed using Affymetrix® Expression Console™ (v 1.1), all arrays met all quality parameters established by Affymetrix, and no outliers were detected. The DABG summary file was generated for the RMA normalized data using Affymetrix power tools.22
All exon array data were analyzed using tools in Genomics Suite software (v6.4; Partek Inc., St. Louis, MO, USA), using Affymetrix annotation files (NetAffx, Version na28. hg18). Exon-level data were filtered to include only those probe sets that are in the “core” meta-probe list, which represents 17,881 RefSeq genes and full-length GenBank mRNAs. The core data include target-sequence, perfect-match, unique probe sets. The RMA algorithm was used for probe set (exon-level) intensity analysis. Adjustments were made for GC content and probe sequence on pre-background-subtracted values. We used background correction, quantile normalization, log2 transformation, and median polishing for summarization.
Data filtration is required to decrease the chances of false predictions, therefore, increasing the verification rate. Pre-analysis filtration included removing any probe set that is not expressed in at least one sample group. This prevents the lack of expression from being mistaken for alternative splicing. Two different filters were tested. The first used DABG group means P value, where all probe sets were retained with a P value <0.05 (or a second stringency level of <0.01). The second pre-analysis filter used signal intensity as the method of determining probe set inclusion. Probe sets that had a group mean of log2 signal >3 (or >5) were kept for further analysis.
Alternative splice detection was performed using a two-way ANOVA, including time (T) and donor (D) as factors. To detect exons expressing differently, depending on the day of stimulation, the ANOVA model used was as follows:
Where y is the expression of a transcript, μ is the mean expression of the transcript (D is a random effect), E the exon effect (alternative splicing independent to time), T*E an exon expressing differently at different times T (interaction term of alternative splicing and time), S(T,D) a sample effect (a random effect, nested in time and donor), and ε the error term. The analysis is performed at the exon level, but the result is displayed at the transcript level.
All genes represented by <5 probe sets in the TCs were removed, as it is often difficult to interpret alternative exon incorporation patterns with so few markers. Any transcripts not represented by a HUGO gene symbol were also removed, keeping the focus of the analysis on known genes. ANOVA P values were corrected using the conservative Bonferroni method. A list of genes with significant alternative spliced events was generated by using a P < 0.0001 cutoff, resulting in a manageable size list.
Secondary filtering was performed on the summarized TC data. Three different methods were used in combination with the pre-analytical primary filtration options. These were: 1) removing all TCs that had high differential exonic expression (more than fivefold change) between the two groups [fold-change (FC)]. These have a tendency to produce false-positives.18 2) Limiting the analysis to TCs that have known alternative splicing (KAS) events; the number of isoforms for each TC was taken from the overlay of TC information on the genomic data using the RefSeq ID in the University of California Santa Cruz (UCSC) genome browser database.23 3) Using the intersect of results obtained from other algorithms, in this case, Microarray Detection of Alternative Splicing (MiDAS) and the pattern-based correlation algorithm (PAC), as implemented in easyExon (EE).24 Data were loaded into EE as the RMA sketch and DABG summary files generated using Affymetrix power tools. Default values for input settings were used. Filtering on DABG retained summarized values if at least three of six arrays have P values ≤0.05. A log stabilization factor of 16 was added to each summarized value, and the meta-probe set file was the core class. Exon-level data were processed, limiting the probe set number to between 5 and 200. For data filtration, the MiDAS P value was set at ≤0.05 and the PAC correlation coefficient, <0.50 (no FC differences were included).
All results were subjected to a detailed manual analysis using the Partek gene view plots to identify alternative splicing forms and determine the frequency of changes observed in the data set. Each statistically significant TC was designated as yes (Y)—there is a very strong indication that an alternative splicing event has occurred, as it is present in >1 adjacent probe set or concurs with a KAS event, as determined from the gene view in Partek genomics; probable (P)—likely alternative splicing event but is associated with a single probe set; unlikely (U)—the changes that are evident appear to be minor or have high signal variance and may not reflect an alternative splicing event; or no (N)—there is clear evidence of a false-positive call and no other indications of alternative splicing. For each TC, a record was also made of the number of known isoforms, 5′ and 3′ edge effects, probe sets (with or without signal) to intronic sequences, or an untranslated region (UTR), and possible nonresponsive or saturated probe sets. Details of scoring are given in Supplemental file 1.
The exon probe sets that were not expressed in at least one sample group were removed from the data set prior to analysis using 2 filters at two different stringencies:either DABG group mean p-values (> 0.05 or > 0.01) or by signal intensity (log2 signal intensity < 3 or < 5). This table gives details of the results and the visual inspection annotations for each of the significant transcript cluster.
All probe sets showing cross-hybridization were not considered in this analysis, as the core meta-probe set annotation file was used. Affymetrix has categorized these probe sets as unique (perfect match only to the target sequence). This results in a data set of 228,871 probe sets that represents 17,881 TCs. In the pre-analytical filtration, it is important to remove probe sets that are not expressed in one or both groups. We compared removal by signal intensity or DABG P value cutoff. For each methodology, two stringency levels were applied: log2 signal intensity, >3 or >5, and DABG, P < 0.05, or P < 0.01. Table 1 summarizes the results. Post-analysis filtration steps were applied after application of the alternative splicing ANOVA, looking for differences between groups D0 (prestimulation) and D3 (poststimulation). A further reduction in the data is facilitated by only including TCs with ≥5 probe sets. The low-stringency DABG filter resulted in a further 45% reduction in the number of TCs (8049/14,795) and the low-stringency signal intensity filter, a 30.1% reduction (12,047/17,237). The higher stringency filtering removed 6310/11,606 TCs by DABG < 0.01 and 5771/11,306 TCs for log2 signal intensity >5 (Tables 1 and and22).
All TCs that passed a multiple test correction P value < 0.0001 were visually inspected and scored to assure that the statistical determination was not the result of a technical issue (Supplemental file 1). Selections of transcript profiles are illustrated in Figure 1, and the results of the different filtration steps are summarized in Table 2. Comparison of the first-level filtration shows that 8049 TCs passed the DABG P < 0.05 filter, and 361 showed alternative splicing (Table 2). On visual inspection, 19% of these were classified as Y and 11%, N. Of the 866/12,047 showing alternative splicing using a signal >3 filter, 14% was designated Y and 9%, N. The overlap of TCs showing alternative splicing identified by each of these filtration methods was 337. The more stringent cutoffs identified 47/177 TCs by DABG filtration and 37/137 by signal intensity as Y. The N calls were 19 and 22, respectively. Statistical examination of the proportion of TCs visually assessed as Y, in relation to the total number of TCs identified as being alternatively spliced, was performed using a binomial proportion test. This showed no statistically significant differences between these filtration steps (Supplemental file 2).
Visual inspection of all of the data showed several circumstances that could lead to a false estimate of an alternative splicing event. These include: 1) edge effects, where there is a nonresponsive probe set in the 5′ or 3′ exon of the TC (Table 3). 2) Nonresponsive probe sets within the TC. Here, signal intensities in both groups are lower than neighboring probe sets, showing smaller variance across replicate samples (Table 4). 3) Probe sets mapping to intronic sequences within the TC that have equally low signal in each group (Table 4). 4) Probe sets that target the 3′ or 5′ UTR of the target transcript (Table 5). An example of each is included in Figure 1. Closer examination of the log2 signal >3 data showed 53% of TCs had probe sets with a noticeable edge effect (Table 3). There were three times as many 5′ edge effects as 3′, and 75% (56/75) of TCs designated as N (visual confirmation of a false-positive call) had edge effect probe sets. Edge effects were noted in 61% of TCs with obvious alternative splicing events (Y). Similar nonresponsive probe sets can be found within TCs resulting from absent or nonlinear probes or if the exon was not expressed in the tissue being examined. Table 4 gives the incidence of nonresponsive probe sets under the two filtration conditions at the different stringencies. The appearance of probe sets to intronic sequences within the TC is similar to a nonresponsive probe set. Approximately 30% of the TCs identified as having alternative splicing had probe sets to intronic sequences (Table 4). Of these, fewer than 23% resulted in a false-positive call when visually assessed. Interestingly, over one-quarter of the intron probe sets using the low-stringency parameters (and >40% at high stringency) showed signal detection, indicating intron inclusion (Table 4). Less than 10% of TCs identified as being alternatively spliced were noted as having probe sets to the UTR at the 3′ or 5′ end and were included in the alternative splicing analysis. This did impact the results and could easily result in false-positive, alternative splicing calls. For example, in the log2 signal, >3 data of the 75 TCs, classified as N on visual inspection, 14 were false-positive calls, because of a UTR probe set (Supplemental file 1). The presence of signal from these probe sets was noted in several of the TCs (Table 5).
Different post-analytical filtration steps were applied to determine their impact on alternative splicing detection including FC, KAS, or EE (Table 2). Results from the more-stringent pre-analytical filters (DABG, P < 0.01, and log2 signal, >5) were similar with regard to percentages of Y and N calls (Table 2). The less-stringent criteria (DABG, P < 0.05, or log2 signal, >3) gave higher numbers of TCs with alternative splicing events for the signal-based filtration with post-analysis filters FC, KAS, and EE (498, 315, and 278, respectively, from 866 TCs) than the DABG pre-analysis filtration with the same post-analysis filters (188, 138, and 122 from 361). The overlap of TCs identified by each of the three post-analysis filters is illustrated in Figure 2. Application of a binomial proportions test to the numbers of TCs visually assessed as Y at each filtration level showed that Level 1 + FC had a higher detection rate for Y compared with all other filtration combinations (Supplemental file 2).
Several effects cause false-positive results in the statistical analysis of exon expression changes of TCs. Visual inspection is required to assess these artifacts (Additional File 1). N calls generally reflect a statistically significant change that has no biological relevance and can be explained by technically aberrant behavior. The less-stringent pre-analysis filters show that using FC restriction gave a higher percentage of Y calls (Table 2), followed by EE, which were similar to using KAS. Combining the post-analysis filters—FC and EE intersect—gave the highest percentage of Y calls (Table 2), but small numbers of TCs.
Exon array data are analyzed in two separate but parallel tracks. One examines exon-level data, allowing for the determination of alternate exon use. The other uses the summarized exon data for gene-level or differential expression analysis. This paper focuses on the exon-level analysis pre- and post-analysis filtration steps. To obtain meaningful splicing information and reduce the effect of multiple testing, a number of different filtering steps can be applied. Several different methodologies have been suggested.13,18,25 In this analysis, we compared some of the suggested pre- and post-analysis filtration methods. One limitation to this study is the small data set used in the analysis. However, the initial understanding and interpretation of the filtration results were greatly facilitated by the size of this data set. We feel that even with these stated limitations, the results are of great interest and will be informative for those researchers embarking on exon array analysis.
One of the most important pre-analysis filters is the removal of probe sets with the potential for cross-hybridization. Xing et al.15 showed that this was a major cause of false predictions of differential alternative splicing. Fortunately, exon array probe sets are classified based on annotational confidence. The core meta-probe set targets exons of well-annotated RefSeq genes with a hybridization target designation of unique. Thus, probes in a probe set match only one sequence perfectly in the putatively transcribed array design, allowing us to use probe sets that show no cross-hybridization.26 This results in 228,871 probe sets, which generate data for 17,881 TCs (Table 1). Probe set sequence is also impacted by the presence of a single nucleotide polymorphism (SNP), which can cause a false-positive call.27 Partek Genomics Suite provides a list of probe sets containing known SNPs, allowing for them to be filtered out. For this particular data set, the results were not impacted by this filtration. It did reduce the data set to 222,207 probe sets, which grouped into 17,648 TCs. The number of probe sets in a TC can also impact the interpretation of alternative splicing events. We found that when <5 were present in a TC, it was difficult to interpret the results. For this analysis, we used all TCs that had ≥5 probe sets.
In all exon analysis workflows, a mandatory step is the removal of any probe sets that are not expressed in at least one group, preventing the lack of expression being mistaken for alternative splicing. This can be done by filtering, using DABG P values or by setting a log2 signal intensity cutoff value. Different stringencies will determine the number of probe sets passing the filtration condition (Table 1). The decision as to which methodology to use is dependent on the number of targets one wants in the final list. This relates to the aim of the analysis. For example, discovery of new splicing events would not be well served by using the KAS filter. It is clear that the pre-analysis filtration step alone does not reduce the data set to a manageable size for visual examination of alternative splicing. However, as pointed out in several publications, the P value for calling an exon alternatively spliced is not obvious.18 Again, it is dependent on the number of targets one wants to see in the final list. Multiple test correction was applied to the TC P values generated by the ANOVA using the conservative Bonferroni correction. Other methodologies, such as Dunn-Sidak, false discovery rate, and q value, can also be used. We know that this does not change the order of the P values and once again, only impacts the number of TCs in the final list.
In an attempt to increase the percentage of alternative splicing events that were considered likely, three post-analysis filters were implemented. These realized significant data reduction and did impact the number of alternative splicing events considered true-positives by visual inspection (% Y). Comparison of these in relation to the two mandatory pre-analysis, low-stringency filtration steps (DABG or signal intensity) showed FC > EE > KAS. The binomial distribution test showed that Level 1 + FC had a higher Y detection rate compared with all other filtration steps (Supplemental file 2). Other suggestions for second-level filtration include removing TCs with high (>5) or low (<0.2) exon/gene-intensity ratios.18 For this data set, neither of these had any impact in reducing the TC numbers. These ratios are affected by cross-hybridization and nonlinear responses of probe sets, respectively, the former of which have been removed already from the data set.
Several classes of false-positives are caught in the statistical analysis, and manual inspection of the data in a genomic context is required for their detection. For this analysis, we chose to use the UCSC RefSeq database for our annotation.23 Inclusion into this database requires significant verification of isoforms, and it is considered a highly conservative database. Other databases are available and if used, would be expected to change the annotations of our experimental data (Supplemental file 1). The impact of using a different database for annotation is illustrated in Supplemental file 3.
Visual inspection is impacted by the data base chosen for transcript annotation. This figure allows comparisons to be made between annotations using the highly conservative UCSC Refseq database and their Known Genes database. All other legend annotations are the same as for Figure 1 in the manuscript.
Bemmo et al.6 noted that probe sets at the 3′ and 5′ ends of TCs showed response properties that were different from the rest of the transcript. They attributed this to increased GC content in 5′-end probe sets as a result of proximity to unmethylated promoters and CpG islands. They postulate that the 3′ effects are likely caused by fewer random-primed, first-strand synthesis events, resulting in lower signal intensities. Our analysis shows that edge effects are present in >50% of TCs having an alternative splicing event. They account for approximately 75% of the false-positive calls, and 5′ edge effects are clearly out-numbering the 3′ effects (Table 3). Probe-level intensity is known to be significantly dependent on the GC content of the sequence. So, Partek Genomics Suite models these effects and attempts to remove it before background correction. Comparison of results with and without the probe sequence adjustments showed no differences in the numbers of TCs identified as significant for the first level of data filtration. It also did not impact the number of edge effects identified, suggesting that it may be best just to remove probe sets known to be associated with high GC content. Other false-positive results can be attributed to poorly hybridizing or nonresponsive probe sets. However, this scenario could also be interpreted, as the exon is not included in both groups, which is not a false-positive event. Currently, it is not possible to differentiate between the two. Many of the probe sets causing these types of artifacts are difficult to correct using the filtration methods suggested in analytical workflows. As more exon array data are accumulated and deposited in public repositories, it will be possible to build lists of poorly behaving probe sets across different tissues, cell types, etc., facilitating their removal prior to analysis. This concept can be extended to building lists of intronic, intergenic, and exonic probe sets for systematic analyses. This would make it possible to explore novel regions of transcription and overcome some of the issues surrounding technically generated false-positives. This has been implemented partly in the X:MAP software, a BioConductor/R package,28 where the probe set can be mapped to exons or introns. By comparing different systematic analyses, one would be able to build a more complete picture of the true alternative splicing events and reduce the dependence on visual inspection. Gathering these probe set lists is a rather daunting task, as genomic annotation is far from complete and ever-changing.29,30 However, if exon array data are going to contribute to this knowledge, this is a necessary step. It will aid the discovery of new, alternative splice events and find transcription events outside of known genes.
It is evident that data interpretation of exon-level results currently requires that it be overlaid onto a genomic profile. Using the highly conservative RefSeq annotation of the 866 TCs identified as being alternatively spliced (log2 signal, >3 filter), only 35.5% or 281/791 (excluding all false-positive or N calls) were known to show alternative splicing. To balance this, we also provide alternative splicing information from AceView,31 which gives alternative splicing information from transcript reconstruction by merging cDNA and genome sequences, allowing discovery of thousands of transcript variants. Using this database, 740/791 (93.6%) had been identified as having splice variants. The truth probably lies somewhere in between.
This paper illustrates the power of exon arrays in identifying different and possibly novel alternative splicing events (such as exon skipping and intron retention) and differential UTR use (e.g., transcription initiation and alternative polyadenylation). The current workflow-filtering steps are a tradeoff between the number of TCs that can be visually assessed to determine “false-positives”, arising from technical probe set behaviors or resulting from multiple testing, and true positives. With the perceived importance of alternative splicing at an all-time high, the need for robust and accurate tools capable of fully interrogating the complete transcriptome is critical to facilitating a clearer understanding of the regulation of biological processes. There is a large degree of imprecision when dealing with the human transcriptome as it pertains to alternative splicing, emphasizing the need for more research, which is now facilitated easily using exon arrays.
The authors thank Dr. Mark Hollier for performing the lymphocyte proliferation laboratory work. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the funding agency.
Financial support/Conflict of interest: Authors certify that they have no commercial associations that may pose a conflict of interest in connection with the submitted article.
This study adhered to human experimentation guidelines of the U.S. Department of Health and Human Services and complied with the Helsinki Declaration. All participants gave informed consent. The CDC Human Subjects Committee approved the study protocol.
The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the funding agency.