We have developed a novel approach for the analysis of exon-junction arrays that will specifically search for exon-skipping alternative splice events. Our proposed score evaluates the product of expression levels across exon-junctions to more accurately reflect exon-skipping. Our score also accounts for overall expression levels for a gene and uses an improved variance stabilizing correction. We have shown that the combination of these approaches improves the discrimination of positive and negative control sets determined from independent data sources. We could not, however, directly compare our method with existing methods in [6
] because that analysis was based on individual array replicates, which we could not access. Nevertheless, utilizing another source of external validation, we demonstrate that the annotations for our alternatively spliced gene predictions were consistent with previous literature. There are several other exon-junction studies, but our method was not applicable because they were either disease specific or a detailed examination of a relatively smaller number of genes [39
Following the array analysis, we examined our sequence predictions for dual purposes; to validate our analysis of the exon-junction array and to predict novel splicing enhancers and silencers. The results from the sequence analysis provide a further source of validation for the quality of our alternative and constitutive exon predictions. Using randomization trials, we found that our predictions were enriched in sequences that were not contained in random sets of sequences. We also identified several sequence signals that are consistent with the experimental literature (e.g.
, GGG triplets and pyrimidine-rich motifs in introns) and identified several motifs that are highly specific. For example, the known exonic sequence enhancer CACC was discovered in the constitutive exons but not in the alternative exons. Another known enhancer, GAAGAA, appears to be more specific to alternative exons than constitutive exons. Furthermore, we found that the occurrence of GAAGAA is biased towards exons with weak splice sites, while the other identified motifs associated with alternatively skipped exons do not have splice strength specificity. The sequence analysis also shows that, although originally developed for gene expression data [50
], correlation-based methods utilizing whole genome data like REDUCE are applicable to splice arrays and corroborate the kmer enrichment analysis without relying on pre-determined cutoffs.
By our definition, the alternative exons show patterns of exon skipping among different tissues. The presence of known exonic enhancers in these sequences supports the hypothesis that depending on tissue-specific expression, the corresponding binding factors are enhancing the splicing of these exons, which would otherwise be skipped. An alternative hypothesis, not supported by these results, but also discussed in the literature [10
], is that silencer sequences in the exon or flanking introns prevent proper splicing and are responsible for exon-skipping. These hypotheses are not mutually exclusive because both enhancer and silencer sequences may function cooperatively in splicing regulation. Furthermore, some sequence elements have been shown to act as both enhancers and silencers [65
A useful feature of the Rosetta array is that both fetal and adult samples were included for three different tissues. We used this to develop a method for predicting genes with pairwise tissue differences in exon-skipping and applied our procedure to explore both the cis and trans-regulation of splicing during development. Our method for pairwise tissue analysis is not limited to the developmental comparison but can be extended to other tissues or cell lines (e.g., cancer versus normal cell lines). We made gene predictions for three tissue types (lung, brain and liver) but only looked at the intersection to focus on general developmental alternative splicing, which was motivated by observed expression changes in specific SR proteins. Our final predictions for development-related alternative splicing were consistent with functional annotation and literature searches. In our sequence analysis of the development-related predictions, we found that a form of the GAAGAA motif may also have a role in alternative splicing during development. Furthermore, the changes in gene expression between fetal and adult tissues for several SR proteins that bind to this motif provide further evidence for their roles in developmental regulation.
Because of the array design, our work focuses on the detection of exon-skipping. However, these data can also be used to predict and analyze other types of alternative splice events. In particular, if alternative splice site selection or intron retention occurs, we would observe variation in a single exon-junction across tissues, but not necessarily in consecutive junctions, as observed with exon-skipping. Therefore, a product summarization step would not be necessary, but we could adapt our procedure in other ways to predict alternative splice selection and intron retention. We will, however, not be able to discriminate between these other types of splice events because of the nature of the array design.
A recent direction taken by several groups is to use sequence conservation across multiple species to aid in the search for enhancers and silencers [22
]. Comparative approaches have also been used to examine the conservation of alternative splice events by comparing genomic or EST data from multiple species [29
]. As more splice array data become available from different organisms and tissues, there may also be opportunities to explore the conservation of splicing events from meta-analysis of splice arrays, without relying on ESTs from orthologous genes that are often limited across species.