Rapid advances in high-throughput techniques have led to a deluge of biological datasets of a variety of types, including sequence, expression, structure, and ontology. One of our primary goals in this study was to develop an approach for integrating heterogeneous datasets and combining genome-wide associations studies with functional genomics. We developed a systematic approach to provide mechanistic or functional insight into how SNPs might affect alternative splicing and conversely to increase our understanding of the genetic regulation of alternative splicing. We investigated the functional implications of intronic SNPs on splicing regulatory elements that enhance splicing junction recognition, cause protein domain changes or are thought to contribute to human traits or diseases. Taken together, our results provide a primary resource for characterizing the functional role of genetic variations in the etiology of complex human traits or diseases. Additionally, we contribute to the compendium of primary functional knowledge of intronic SNPs. This latter point is particularly important as we move toward analyzing complex human disease sequence data, in which the knowledge of functional variants, including those involved in alternative splicing, will be key to making progress given the sea of genetic variations to be examined.
ISEs assist in splice-site recognition of adjacent exons and promote exon inclusion. ISE splicing control is dependent on many variables including variation in intron length, the density of ISEs in a given intron
[32], and the strength of 5′/3′ splice sites sequences. Non-traditional splicing regulation may also occur without known splicing regulators
[14],
[38]. Furthermore, the relative position of the ISE to the downstream 5′ splice site has been shown to vary from 6 to 500 nucleotides
[39],
[40],
[41],
[42],
[43],
[44],
[45],
[46],
[47],
[48] and some ISEs with UGCAUG motifs may be more than a kilo-base downstream
[49],
[50].
Multiple distances (<60 bp, 200 bp, 1 Kbp, and 5 Kbp) between ISE and adjacent skipped exons were tested to determine if a particular distance exists in which the contained ISE SNPs are more likely to be human complex trait associated-SNPs, as defined by their inclusion in the NHGRI GWAS catalog ( and ). The observation that all distance boundaries for ISE-associated SNPs were found to exhibit enrichment of trait associated SNPs supports the claim that ISEs do not require proximity to their associated splice site in order to be efficacious. Thus their mechanism of action may extend beyond attracting splicing machinery to include other mechanisms not yet considered in this study. Additionally, this evidence may suggest the need for extended ISE motif predictions and analyses that include genomic regions that are farther from their exon of interest.
At the cellular level, the splicing regulatory mechanism is a multifaceted biological process that causes splicing to occur, initiated by splice site recognition and guided by SREs. The components of the spliceosome are then recruited to the correct splice junction, branch point or other splicing loci. However, while many elements that affect the splicing code have been identified, their combinatorial effects have yet to be comprehensively characterized. Since our study focuses only on ISE associated SNPs, it is important to mention this is a simplification and that ISEs are believed to work in conjunction with other SREs (ESS, ESE and ISS). This mechanism depends on the relative strength of the regulatory elements themselves, the strength or weakness of the actual splice site and the lengths of their associated introns/exons. Their synergistic antagonism or protagonism is believed to be contextual in that it relates not only to their sequence location but also to the relative location or existence of other SREs. The enhancer and silencer characteristics of these additional elements could be explored in a future study utilizing the method put forth in this paper.
Alternative splicing events have been shown to disrupt entire protein domains
[35] and most often affect certain protein kinase domains and coiled-coil sequences embedded in transmembrane area
[51]. Here, we evaluated the functional impact of alternative splicing on each relevant protein as aberrantly skipped exons can disrupt protein domains fully or partially or affect their final structure and function if the disruption is present in a critical area for biological activity, affinity or folding
[36]. To this end, experimentally verified skipped exons that are regulated by intronic SNPs were investigated using protein domain analyses which provide a systematic molecular basis for determining how intronic SNPs affect their resultant AS variants. This analysis was also used to infer the functionality of intronic SNPs whose role in affecting final protein products has otherwise remained elusive.
Alternatively, exon-skipping events can further affect protein functionality if the excised piece of DNA is of a length that is not an exact multiple of 3. In this case, it is possible that all the exons downstream from the skipped splice junction would be greatly affected as their reading frame is shifted. This change can produce an entirely different amino acid sequence and may introduce premature termination codons, which (depending on where in the gene this occurred) could lead to nonsense-mediated decay (NMD)
[52]. Thus, even though ISE SNPs are, by definition, not contained within a protein-coding region, such polymorphisms have the potential to cause physical changes in the resultant protein apart from the loss of an exon through reading frame.
Exon skipping is the first alternative splicing (AS) mode accounting for 40% of AS events in higher eukaryotes
[53],
[54]. Alternative 3′/5′ splice sites and intron retention are the second (18.4%) and third (7.9%) most common types, respectively. It has been shown that the exon array (Affymetrix GeneChip Human Exon 1.0 ST Array) is more useful for investigating exon skipping and retained intron events than other types of AS events such as alternative 5′/3′ intron splice site and alternative polyadenylation sites
[55]. For this reason, the current study focused on analyzing exon skipping events.
In our study, this small set of experimentally validated skipped exons was tested to explain the possible functional mechanisms of intronic SNPs impacting exon skipping events. A larger data set of experimentally validated skipped exons and ISE SNPs should be studied in the future. Additionally, to further characterize inter-individual variability in exon skipping, individual sequence data should be integrated with our approach to investigate other SNPs located in the 5′ and 3′ splice sites of skipped exons. Although we have shown that the ISE SNPs are enriched for cis-eQTLs and have identified an association of genotype with exon skipping in the HapMap samples (using the SI statistic), it is possible that our findings are limited by eQTL tissue specificity
[56].
In this paper we propose a systematic approach to integrate sequence, expression and genetics data (genotype/phenotype) in order to elucidate the impact of genetic variations on exon skipping and their importance in complex traits and diseases. Discerning genetic regulation of splicing is undeniably critical for understanding abnormal or physiological changes in AS since the number of currently characterized human splicing regulators cannot alone account for the tremendous number of splicing events known to occur in humans. We have shown not only that intronic SNPs are associated with exon skipping events but also that these SNPs are associated with complex traits, and that they are predicted to result in protein domain changes. While additional studies are needed to fully understand the role that genetic variation in SREs may play in alternative splicing, as well as how much AS-associated genetic variation contributes to common disease, this study provides support for continuing such investigations.