Although these examples serve to demonstrate that genetic mutations can result in human disease by disrupting splicing, they have much broader implications. Most common SNPs are not thought to cause human disease. However, it is possible that they are not aphenotypic. Perhaps these SNPs are at the root of individual variation in the human population. If this is true, however, it is unclear how these SNPs manifest themselves phenotypically.
The recent studies by Hull et al
] and Kwan et al
] have explored this area to search for variations in alternative splicing within the human population. The two groups have taken very different approaches, but both have made use of the HapMap Project data (Box 2
) to identify common SNPs that correlate with the observed splicing changes.
The HapMap project. In 2002, the International HapMap Project began with the goal of cataloging common genetic variants that occur in the human genome and the distribution of these variants among populations throughout the world. To do this, 1 million single nucleotide polymorphisms (SNPs) that were evenly distributed throughout the human genome were genotyped for 269 samples from four populations [10
]. These samples included 90 individuals from Ibadan, Nigeria, 90 individuals from Utah, USA, 45 individuals from Bejing, China, and 44 individuals from Tokyo, Japan. An important outcome of this project was the identification of haplotype blocks – regions of the genome in which closely located SNPs are co-inherited. This allows for the patterns of inheritance for large regions of the genome to be traced by analyzing only a subset of the known SNPs in each haplotype block. The information derived from the HapMap project, along with the use of genome-wide genotyping assays, has dramatically increased the pace at which researchers can identify genes associated with inherited human disorders such as myocardial infarction [11
], diabetes [12
], and asthma [13
]. Once these disease genes are identified, it is important to understand the mechanisms by which the causative mutations give rise to the disease state.
First, Hull et al
] took a directed approach by logically selecting exons to study that are likely to be differentially spliced in the human population. They reasoned that, for any exon that had this property, the mRNA isoforms containing and lacking this exon should both be commonly observed. They identified 250 exons that satisfied their criteria and verified that, for 70 of these exons, both isoforms were indeed expressed in lymphoblastoid cell lines (LCLs) generated from the CEPH HapMap project. A comparison of the splicing pattern and genotypes of 22 LCLs revealed that six of these exons were spliced in an allele-specific manner.
By contrast, Kwan et al
] took a more global approach. They analyzed splicing in two LCLs from the CEPH HapMap project on microarrays containing 1.4 million probe sets to query 1 million known and/or predicted exons. They identified nine exons that are differentially spliced between the two LCLs and subsequently showed that three exons within the genes encoding 2’, 5′-oligoadenylate synthetase (OAS1), calplastinin (CAST), and cartilage-associated protein (CRTAP) are spliced in an allele-specific manner.
Although some of the SNPs linked to these splicing changes are located far from the affected exons identified in these two studies, others are located in close proximity or even within the alternatively spliced exon, providing an opportunity to predict the mechanisms by which the sequence differences could impact the splicing pattern. Kwan et al
] found an SNP located at the 5′ splice site of the affected exon in CAST
, suggesting that this SNP most likely impacts the efficiency of U1 small nuclear ribonuclear protein particle (snRNP) binding. Four of the genes identified by Hull et al
] contained SNPs within the affected exon. In SH3YL1
, these SNPs are located just downstream of the 3′ splice site, suggesting that they modulate the efficiency of 3′ splice site recognition. However, for ZDHHC6
, the SNPs are located in the middle of the exon and most likely create or destroy either an ESE or an ESS. Finally, the SNPs within CD46
are located within the flanking introns. The CD46
SNP is located just downstream of the 5′ splice site and most likely impacts the efficiency of 5′ splice site recognition. Interestingly, in IFI16
, the SNP is located 1300 nucleotides (nt) upstream of the 3′ splice site, making it hard to predict how this SNP might impact splicing. Nonetheless, these examples illustrate that identifying SNPs that modulate splicing can provide clues into the mechanisms by which they exert their effect.
The fact that these two studies identified only nine exons that are differentially spliced in an allele-specific manner that correlates with SNPs that are common in the human population suggests that allele-specific splicing is uncommon in the human population. However, it is more likely that the number of exons identified from these studies is a vast underestimate of the true extent of this phenomenon. First, Hull et al
] selected only 250 exons from the human genome to study. Second, although Kwan et al
] used exon arrays, which should provide a genome-wide view of this phenomenon, the authors proceeded to experimentally validate only a small subset (20 of ~1000 candidate events). Furthermore, the authors only examined RNA isolated from two different cell lines on these arrays, and therefore, many of the common human haplotypes were not examined. Second, and perhaps most importantly, although these arrays can identify alternative splicing changes, there are several issues that complicate interpretations of the results. First, the algorithms used to deconvolute the microarray results require one to assume the splicing patterns for each gene, which if wrong, can lead one astray. Second, these arrays have difficulty in identifying cases where the splicing changes are subtle, even though they might be significant, both statistically and functionally. Third, the arrays can be ‘noisy’ or have a high degree of false positives and false negatives – the study by Kwan et al
] had a 55% false-discovery rate. Several of these issues could be improved on in the future by the use of similar arrays that are supplemented with splice junction probes or with even newer technologies. Although these are things that need to be kept in mind, this approach did allow the authors to identify bona fide examples of allele-specific splicing events.