Comparison of hESC-Derived Cardiac and Neural Progenitors to Adult Tissues
To identify genes with common CP-NP differentiation or CP-specific AS patterns, we isolated homogenous populations of hESCs and cardiac precursors and compared them to a dataset of neural precursor differentiation
[10]. To determine the relationship between undifferentiated, progenitor and adult tissues, we performed an unbiased analysis using expression clustering and an analysis of cell-specific marker gene expression.
Homogenous populations of undifferentiated hESCs and CPs were isolated by modifying the H9 ESC line to stably express neomycin-and puromycin-resistance genes, driven by the pluripotent-cell-specific REX-1 and CP-specific myosin heavy chain alpha (MHCα or MHY6) promoters, respectively (see
Materials and Methods). In embryoid bodies that underwent puromycin selection and were differentiated for 40 days, the action potentials and axial force measurements of the resulting CPs were similar to those of normal fetal cardiomyocytes
[11]. RNA harvested from REX+ hESCs and CPs was analyzed with Affymetrix human exon 1.0 arrays; these data were combined with a dataset of two published hESC lines (Cythera and HUES6) and NPs derived from these hESCs
[10] and a dataset of 11 adult human tissues
[13].
Gene expression values from these exon arrays were determined for constitutive aligning or all probe sets for each Ensembl gene (
Materials and Methods) (
Dataset S1). As shown by hierarchical clustering
[31] of these gene expression profiles, the three hESC lines (H9, Cythera, and HUES6) were more closely correlated to each other than to their differentiated progenitors or adult tissues (). However, CPs and NPs, while more closely correlated to each other and undifferentiated hESCs, were less correlated to their
in vivo tissue counterparts (heart and cerebellum). For several reasons, such differences are not unexpected; adult tissues are composed of a variety of cell types, passaged cell lines were compared to adult cells, the arrays were processed in different laboratories, and the cells are of distinct genetic origins. Although gene expression profiles of CPs were not closely correlated with samples from adult heart, CP gene expression levels were similar to those of adult heart for all cardiac markers examined (). Both NP lines express neuron-specific markers
[10]. Thus, these progenitor cells are closely related to hESCs but retain cell-type-specific marker expression and therefore are appropriate cell systems for assessing AS and differentiation into cardiac and neural lineages.
AS Is a Key Feature of CP Differentiation
To identify alternative exons in day 40 CPs versus REX+ hESCs and link the results to predicted sequence changes that might alter protein expression/function, we created a free, open-source application called AltAnalyze (
http://www.AltAnalyze.org) (
Materials and Methods). For exon array analysis, AltAnalyze uses the SI approach
[19],
[20] to calculate a probe set fold change corrected for gene-expression and corresponding SI
t test and MiDAS p values. To identify higher-confidence alternative exons, only regulated probe sets overlapping with exons in annotated mRNAs (Ensembl or UCSC Genome Browser, including retained introns) were used for further analyses.
Of the 13,583 genes with evidence of expression, 16% (2,106) were predicted to have at least one alternative exon in the differentiation to CPs, as compared to 3,044 genes with up- or downregulated gene expression (). Of the alternatively regulated genes (ARGs) containing one or more alternative exons, 42% (876) had alternative exons with prior evidence of AS (defined here as alternative or mutually exclusive cassette exons, alternative splice sites, exon-bleeding
[15], alternative-C-terminal exons or retained introns; see ), 8% (170) had an alternative promoter (AP), and 7% (152) had both; the remainder occurred in constitutive exons (
Dataset S2). Thus, our analysis predicts that 3% of all Ensembl genes examined (876 of 29,151) are alternatively spliced relative to 10% (3,044 of 29,151) differentially expressed during the differentiation of hESCs to CPs.
| Table 1Alternative gene regulation during CP differentiation. |
As in earlier studies of NP differentiation
[10], pathway analysis of all alternative exons showed that serine/threonine protein kinases and genes with helicase activity were highly enriched during differentiation to CPs. We also found over-representation of actin cytoskeleton, RNA splicing, cell-cell adherens junction, chromatin binding, regulation of muscle contraction, and cell cycle checkpoint genes, among others, using the program GO-Elite (
Dataset S4).
Assessing the Impact of AS on Protein Domain and miRNA Binding Site Architecture
Although analysis of protein composition determined by AS is not entirely novel
[32],
[33], AltAnalyze has many novel features. For example, it incorporates protein, domain, and motif information from multiple protein annotation databases (UniProt, InterPro), detailed information on the competitive isoforms analyzed, and analyses of miRNA binding site predictions from multiple databases (PicTar, miRbase, miRanda and TargetScan). By default, domain/motif predictions are derived by comparing two protein isoform sequences—one that aligns to the alternative exon and another in which the exon is absent from the corresponding mRNA sequences (competitive isoforms). Since many aligning and nonaligning mRNAs can exist for each alternative exon, an algorithm is used to identify isoform pairs with the smallest differences in exon composition (). Identification of the corresponding protein (Ensembl or NCBI) or
in silico translation of these competitive isoforms allows AltAnalyze to identify protein domain or motif sequences (InterPro and UniProt) that differ in the inclusion of a target exon between the two isoforms (competitive isoform analysis) (). An alternative method analyzing domain/motif regulation is also used, whereby the genomic coordinates of probe sets that correspond to an alternative exon are compared to the genomic position of all InterPro domains and motifs for that gene (direct genomic alignment) ( and
Text S1). Since AltAnalyze is a freely distributed software package, these analyses can easily be applied to any exon-array dataset.
AS Modifies Predicted Protein Domain and miRNA Binding Site Architecture during Cardiac Differentiation
For alternative exons regulated during CP differentiation, predicted changes in domain/motif and miRNA binding site composition were examined with AltAnalyze. The majority of alternative exons during CP differentiation (79%), corresponded to competitive mRNA isoforms (sharing some exons, but not the probed exon) (
Materials and Methods) (
Dataset S2). Competitive isoform analysis predicted that one or more protein domains/motifs would be modified or absent in 62% of alternative exons. But with the direct genomic alignment method, only 32% of the alternative exons were predicted to affect domains/motifs. Although 27% of the direct genomic alignment predictions only occurred with this prediction method, these typically occurred in constitutive regions with no competitive isoforms identified. Thus, many of the domain/motif changes predict by competitive isoform analysis occurred as result of indirect protein differences (e.g., protein truncation), while others could only be identified with the direct genomic alignment method.
To determine whether certain domains or motifs were over-represented by both methods, we examined the associated over-representation z scores and permutation-based p values for all CP differentiation ARGs (
Materials and Methods). Using either the competitive isoform or direct genomic alignment methods, AltAnalyze identified over-representation of the laminin globular domain, myosin head motor region, and spectrin, plectin, collagen triple helix and rho-binding repeats. In addition, competitive isoform analysis alone identified the KCNQ, C-terminal and metalloprotease regions, and the actin-binding, spectrin-actin binding, KH, CUB, FERM, IQ, SH3, and protein kinase domains. Direct genomic alignment identified the START lipid-binding, semaphorin and Dbl homology domains, among others. Several of these same elements were also enriched with the AltAnalyze analysis of NP differentiation, including spectrin repeats and CUB, FERM, IQ, SH3, and the laminin globular and actin-binding domains (
Datasets S2 and
S3).
In addition to protein domains/motifs, 12.5% of ARGs associated with CP differentiation (264 of 2,106) resulted in the predicted gain or loss of at least one miRNA binding site (). Nearly half of the miRNA binding site changes (129 of 264) were in an exon with evidence of AS, often as a result of intron retention, exon bleeding, or an alternative C-terminus; the remainder were in a constitutive exon with a variable 3′ length and thus are not likely due to AS (
Dataset S2).
| Table 2Regulation of miRNA binding sites during CP differentiation. |
Interestingly, a recent study also observed alternative expression of exon regions containing miRNA binding sites, when these cells began to actively divide
[34]. This study was performed with the mouse Affymetrix exon array and revealed shorter untranslated regions (UTRs) and fewer miRNA binding sites in the mRNAs of proliferating CD4 positive T lymphocytes. Similarly, fewer predicted miRNA binding sites were found in pluripotent REX+ hESCs than in CPs (204 downregulated alternative exons with binding sites in hESCs out of 264, χ
2 p<0.01). The same trend was observed for NPs versus hESCs (
Datasets S2 and
S3). Thus, a loss of miRNA binding sites may be common to actively dividing cells. Using the same analysis applied to domains and motifs, only one miRNA binding site was over-represented among ARGs in CP differentiation (hsa-miR-219) and only one for NP differentiation (hsa-miR-487a). Neither miRNA has previously been associated with ESC differentiation.
AltAnalyze Identifies Known Differentiation Splicing Events That Correlate with Altered Protein Function
Several predicted splicing events identified in CPs were verified in previous analyses of hESC differentiation. These included genes that underwent AS in the differentiation to NPs (SLK, SORBS1
[10], and NFYA
[35]) and in cardiac/muscle differentiation (ATP2A2
[36],
[37], NF1
[38], PKM2
[39], and ANXA7
[40]). SLK had one of the largest exon inclusion SI scores in hESCs, and ANXA7 had one of the largest in CPs. RT-PCR analysis of six of these AS events (ANXA7, ATP2A2, NF1, PKM2, SLK, and VCL) in REX+ hESCs and CPs verified the expected changes in isoform expression ().
At least three of these verified AS events correspond to modified protein function/expression, producing differences in cell metabolism (PKM2)
[41],
[42], signaling (VCL)
[43], or mRNA stability (ATP2A2)
[36]. In each case, AltAnalyze predicted the regulation of protein domains or motifs previously associated with these differences. In the case of PKM2 or pyruvate kinase, two isoforms (M1 and M2) are expressed through mutually exclusive splicing (mutual gain and loss of a cassette exon)
[39]. Although the two alternatively spliced exons are the same length (167 base pairs) and have 60% protein sequence identity to each other, they differ in their tissue developmental expression patterns, protein motif composition, and
in vivo functions
[39]. M1 is largely present in normal adult heart, skeletal muscle, and brain and is not allosterically regulated by fructose-1,6-bisphosphate (FBP). M2 is present only during embryonic development and in tumors, is regulated by FBP, and promotes proliferation
[41],
[42]. Isoform expression levels and protein predictions by AltAnalyze matched the previously identified motif changes corresponding to functionally relevant differences in these proteins (loss of the UniProt-defined FBP binding region and intersubunit contact and upregulation of the M1 exon in CPs) ().
For vinculin (VCL), the gain of a vinculin/alpha-catenin sequence by AS is associated with altered ligand binding properties of the muscle form of the protein
[43]. For this same alternative exon, AltAnalyze predicted the same previously reported protein sequence difference. For ATP2A2 (cardiac sarco/endoplasmic reticulum calcium ATPase), intron retention of 4068 base pairs before the last exon increases the length of the cytoplasmic topological domain (45 amino acids [aa]) and 3′ UTR. Expression of the long C-terminal form (isoform 2b) of ATP2A2 (hESC-enriched) increases mRNA degradation of this transcript
in vitro [36]. AltAnalyze similarly predicted that this AS event would increase the length of the cytoplasmic topological domain (UniProt), in addition to the inclusion of several putative miRNA binding sites (hsa-miRNA-429, 200b, and 182), each supported by evidence from multiple miRNA binding site prediction algorithms (). CPs had decreased expression of the retained intron containing these miRNA binding sites relative to hESCs. Interestingly, two of these miRNAs, miRNA-200b and miRNA-182, were highly enriched in CPs derived from mouse ESCs
[44], suggesting a possible mechanism for the increased degradation of the long 3′UTR form. Thus, AltAnalyze suggested a new mechanism for the regulation of ATP2A2 expression by AS.
Regulation of Distinct Pathways for Cardiac- and Differentiation-Associated Splicing Events
The combination of cardiac and neural differentiation data provides a unique opportunity to define molecular profiles unique to or in common to specific differentiation paradigms. To identify AS events during CP differentiation that correspond to CP-specification or inhibition/promotion of differentiation, we used two-way ANOVA to compare alternative isoform expression between cardiac and neural differentiation (
Materials and Methods). Considering the large number of well-characterized cardiac, neural, and hESC markers, we initially applied this method to transcriptionally regulated genes for the two differentiation paradigms. Of 3,044 differentially expressed CP genes, 1,962 had a common CP-NP differentiation expression pattern in cardiac and neural differentiation (differentiation p<0.05), and 951 were preferentially regulated during CP differentiation (interaction p<0.05). As predicted, genes with the lowest ANOVA differentiation (common to CP and NP differentiation) p value corresponded to key pluripotency factors (e.g., LIN28, OCT3/4) and pathways (cell cycle control and regulation of pluripotency). Likewise, genes with the lowest ANOVA interaction (differing in CP and NP differentiation)
p value corresponded to well-described cardiac markers (e.g., TNNC1, TNNI1, TNNI3, MYH6, MYH7, PLN, GATA4, GATA6, NPPA, TBX5, TBX20) and pathways (early cardiac developmental, muscle proliferation, cardiac muscle contraction) when analyzed with GO-Elite (
Figure S1).
When applied to alternative exons regulated during CP differentiation, this ANOVA method identified 565 genes with a common CP-NP differentiation expression pattern and 414 genes with a CP-specific expression pattern (). In both groups, we considered only alternative exons associated with AS annotations by AltAnalyze. Three AS events identified from previous studies (SORBS1, SLK, and ATP2A2) were among the top 26 ranked genes (ANOVA p) in the two expression pattern groups examined.
When pathway analysis was applied to AS genes with a common CP-NP differentiation splicing pattern, the most enriched ontology categories/pathways were water binding, RNA and chromatin binding, integrin-mediated signaling, microtubule binding, extracellular matrix, and lipid transport (). In contrast, AS genes with a CP-specific pattern were enriched in pathways for phosphatidylinositol binding, sarcoplasmic reticulum, negative regulation of neurogenesis, regulation of heart contraction, Wnt receptor signaling, and regulation of cyclin-dependent protein kinase activity. Both sets were enriched in serine/threonine kinases, helicases, actin cytoskeletal, RNA splicing, and cell cycle checkpoint genes ( and
Dataset S4). These results imply that the loss of pluripotency corresponds to AS of genes that regulate transcription, cell-cell contact formation, and metabolism, while cardiac-enriched events favor contractile pathways, inhibition of neurogenesis, and Wnt signaling (
Dataset S4).
Confirmation of CP Differentiation AS by RT-PCR
Fifty alternative exons with a CP-specific or common CP-NP differentiation pattern were selected for further confirmation and in-depth analysis of domain/miRNA binding sites. When applied to a previously described dataset with comprehensive validation (knockdown of the splicing factor PTB)
[45], AltAnalyze identified a high percentage of true-positive splicing events (
Text S2). To better assess the impact of splicing on predicted domain/miRNA binding site composition, genes containing such predictions with few alternative exons were preferentially tested. RT-PCR analysis confirmed 44 of the 50 target alternative exons, with 37 displaying significantly larger shifts in isoform expression than the rest ( and
Figure S2). The six failed primer sets produced either inconclusive results or missing PCR products. For all genes except VCL, only one alternative exon per gene was tested, even if multiple exons were predicted.
Genes with some of the most pronounced confirmed changes and a common CP-NP differentiation AS pattern included those encoding serine/threonine kinases (SLK, FER, FYN, MARK3), spectrin-actin binding (SPTBN1, ADD3), cell cycle (MADD, PCBP4, SEPT6), and cell-cell communication (TJP1) proteins. Similarly regulated genes with a CP-specific AS pattern included those encoding calcium signaling (ASPH, ANXA7, ATP2A2), cell metabolism (PKM2, OGDH), cell cycle (NUMB, UBE4B, CSDE1, NF1, ANXA7), and double-stranded RNA binding (STAU1) proteins. Several of these confirmed AS events appeared to have cardiac/muscle-specific and common CP-NP differentiation patterns when examined with the entire adult tissue/cell line exon-array panel. This was the case for the genes KIF13A and CSDE1, each of which showed the highest alternative exon expression for hESCs or cardiac/muscle cells, respectively, when compared to all other tissues and cells (). Thus, most of the examined alternative exons have expression patterns consistent with those predicted by AltAnalyze and appear to mimic those in adult tissues.
Identifying Splicing Events Predicted to Modify Protein Function
The possible effects of AS on protein function are diverse and therefore challenging to predict bioinformatically. Since AltAnalyze identified confirmed domain/motif changes that correlate with changes in protein function (e.g., PKM2
[41],
[42] and VCL
[43]), we chose to further explore AltAnalyze's predictions for the 44 confirmed alternative exons. These analyses are useful for evaluating how splicing may impact protein domain/motif inclusion and composition for future validation studies.
High Degree of Specificity of Domain-Level Protein Predictions
Of the 44 confirmed splicing events for CP differentiation, 34 were initially predicted to alter protein domain or motif composition (). Although validation of these events requires careful
in vitro analyses, to evaluate the specificity of AltAnalyze protein predictions, we examined the genomic overlap of an alternative exon with domains/motifs (direct genomic alignment) and performed a more exhaustive competitive protein analysis to determine whether any comparisons yield an absence of changes in domain/motif composition (
Text S1). This exhaustive method examines all possible competitive protein isoform pairs and selects the pair that produces the smallest overall differences in protein sequence and domain/motif predictions.
| Table 3AltAnalyze functional predictions for confirmed CP differentiation AS events. |
Twenty-two of the 34 alternative exons were predicted to have domain/motif changes with both direct genomic alignment and competitive isoform analysis. These alternative exons should directly change the sequence or disrupt a domain/motif and thus represent higher-confidence predictions. Only one gene, LEFTY1, was predicted to alter the sequence of a domain (transforming growth factor β) with direct genomic alignment and not the competitive isoform analysis. In all but four of these 34 alternative exons, changes in domain/motif composition were also predicted by the exhaustive comparison method. Three of these four alternative exons were present in both untranslated and coding regions of the different possible isoforms. Since the exhaustive method is biased towards selection of competitive isoforms that produce no change in domain/motif composition, only competitive isoforms where the alternative exon was present in an untranslated region were selected. Of the remaining 30 alternative exons, 17 had identical domain/motif predictions with the exhaustive and the original competitive isoform analysis and 13 had almost identical predictions (largely the same but sometimes fewer domain/motif changes) with the exhaustive method (
Dataset S2). Thus, several of the domain/motif changes were only found with the competitive isoform analysis and not with direct genomic alignment or the exhaustive methods. However, it is unclear which predictions are false positives, since AS of a single exon could theoretically co-segregate with other AS or AP events in the same transcript.
Impact of Domain-Level Predictions on Known Biology
Predicted changes in protein domain/motif composition for confirmed splice events could be classified into four groups: truncation, disruption, exchange, and modification (). For protein truncation or disruption, we can infer potential functional consequences of the domain/motif change based on known biology of the protein and its interactions. For exchange or modification, we can only speculate that the protein function differs from that of the characterized isoform(s).
Protein Truncation with CP Differentiation
Nine of the confirmed AS events (CDC42, CLK1, EWSR1, FER, HDAC9, LRRFIP1, OGDH, VCL (exon 10-2), and WNK2) were predicted to introduce a premature stop-codon into the transcript, causing either protein truncation (>30% decrease in protein length) or absence of translation (e.g., nonsense-mediated decay)
[6]. In the majority of cases, except for FER and LRRFIP1, no protein sequence was found in public databases for the truncated isoform; therefore, AltAnalyze produced theoretical protein sequences based on
in silico translation. In four cases (WNK2, CLK1, HDAC9, EWSR1), the
in silico predicted protein was 2–30% of the length of the competitive isoform and thus likely not to be translated. In addition to these C-terminally truncated proteins, N-terminal truncation of CDC42BPA, HIF3A and NAV2 resulted in substantially shorter protein sequences (35–39%) with a resulting loss in domain/motif predictions. There did not appear to be a bias towards increased protein truncation in CPs or hESCs.
Since these splicing events are predicted to significantly reduce protein size and domain/motif composition, there is a much higher likelihood that these changes would disrupt protein function or prevent protein translation. For example, in cardiomyocytes, the large upregulation (~8-fold by AltAnalyze) of a cassette exon in the histone deacetylase HDAC9 protein is predicted to truncate the reference isoform from 1066 to 21 aa. HDAC9 typically represses expression of myocyte enhancer factor 2 (MEF2), a potent cardiac inducing transcription factor
[46]. Truncation or more likely absence of translation should thus alleviate HDAC9's repressive action on MEF2 transcription factors to promote cardiogenesis. Among the four MEF2 family members examined in this analysis, three were transcriptionally upregulated, with MEF2C increased 51-fold in CPs versus ESCs. In the case of hypoxia-inducible factor-3α (HIF3A) (common CP-NP differentiation pattern), the putative truncated isoform (lower band by RT-PCR) was expressed in both hESCs and CPs, while the full-length isoform was largely restricted to CPs. The truncated form of HIF3A, resulting from selection of an alternative 5′ splice site, is predicted to disrupt the DNA-binding, PAS, and nuclear translocation domains and the helix-loop-helix motif. HIF3A participates in the adaptive response to hypoxia as a transcriptional regulator of downstream genes
[47]. The precise function of this variant is unclear, but its exon and domain structure are similar to those of a mouse variant of this gene called inhibitory PAS domain protein, a dominant-negative regulator of HIF3A transcription
[47]. Thus, splicing of HIF3A may play a critical role in regulating hypoxic responses in pluripotent versus differentiated cells.
Disruption of Domains and Motifs with CP Differentiation
In addition to protein truncation, removal of protein sequences was also predicted to disrupt domains and motifs in 10 of the confirmed AS events (ABI2, ANXA7, ASPH, ATP2A2, KIF13A, NEDD4, PCBP4, SPTBN1, STAU1, and UBE4B). In CPs, these predictions include the disruption of the C2 calcium-dependent membrane targeting domain in the NEDD4 protein with exclusion of a 72-aa block of exons; intron retention in the PCBP4 gene, which produces a shorter N-terminus that disrupts a KH domain; and the disruption of a phosphopantetheine attachment site in the UBE4B protein with the insertion of a cassette exon encoding 129 aa. In hESCs, the disruption of presumptive domains was observed with the exclusion of 61-aa-encoding exon in the ABI2 protein that eliminated the predicted presence of a neutrophil cytosol factor domain; and removal of the first 9 aa from the double-stranded DNA binding domain in the STAU1 gene.
Since these domains are crucial for the annotated functions of these genes, the predicted sequence loss or disruption could affect their function considerably. An example is PCBP4, an RNA-binding protein and regulator of apoptosis characterized by presence of a KH domain. PCBP4 with an intact KH domain can suppress cell proliferation by inducing apoptosis, but is largely absent in hESCs. Since PCBP4 has a common CP-NP differentiation-splicing pattern, AS of this gene may be important in maintaining pluripotency in hESCs.
Two other, genes aspartyl beta-hydroxylase (ASPH) and spectrin, beta, non-erythrocytic 1 (SPTBN1) both had prior evidence of functionally distinct splice variants, linked in this case to the regulation of cardiac physiology. In the case of ASPH, the cardiac-enriched form specifically complexes with cardiac contractile components (calsequenstrin, triadin, and the ryanodine receptor)
[48] in the release of sarcoplasmic calcium; in contrast, the hESC-enriched form is highly expressed in neoplastic cells, has a distinct cellular localization, and has an additional enzymatic domain that regulates growth factor activity
[49],
[50]. Proteins encoded by SPTBN1 are found in the sarcomere along the muscle Z-line and likely contribute to structural stability
[51]. Consistent with AltAnalyze predictions, upregulation of a short form of SPTBN1 in CPs and NPs is associated with the loss of the pleckstrin homology domain (producing a loss of inositol-1,4,5 triphosphate binding)
[52] and when associated with spectrin alpha 2, the shorter protein forms more stable tetramers than the longer protein
[53]. Other splicing events had less clear functional implications on protein sequence, such as the microtubule-dependent motor protein KIF13A, in which removal of an exon encoding 35 aa results in the loss of one of three phosphoserine sites indicated by UniProt. If modulated directly by a protein kinase, however, such a change could alter the regulation of the resulting protein.
Exchange of Domain Sequences by Mutually Exclusive AS
Unlike the disruption of a critical protein domain, the functional impact on a domain with an altered sequence is less clear. As shown in the case of PKM2, mutual-exclusive splicing can alter the presence of functionally important protein residues without significant changes in overall protein sequence. This was also the case for the E2A immunoglobulin enhancer-binding factor TCF3 and for the serine/threonine and protein-tyrosine kinase FYN, in which a DNA-binding or kinase domain is specifically altered by the mutually exclusive exchange of a cassette exon of similar lengths, respectively. Interestingly, the mutually exclusive isoforms of TCF3 and the FYN have different biochemical properties
[54],
[55], suggesting that the domain-level alterations predicted by AltAnalyze correlate with function. In the case of TCF3, although the DNA-binding domain is 76% identical between the mutually exclusive isoforms, the hESC-enriched isoform (E12) has less DNA-binding affinity than the differentiation-enriched form (E47). Thus, AS of mutually exclusive exons is a potent means to modify specific residues within a sequence block without significantly changing overall protein length.
Modification of Domain/Motif Composition with AS
Although some confirmed AS events significantly changed a domain sequence, the domain was still called present in both alternative isoforms. This was the case for nine genes with confirmed alternative exons (ADD3, CAPZB, DNM1L, HISPPD2A, MARK3, SLK, TJP1, VCL, and VPS39). Specific examples include the removal of 32 aa in the C-terminal aldehyde ferredoxin oxidoreductase domain of the ADD3 protein, insertion of 13 aa into the dynamin GTPase region of DNM1L, modification of the C-terminal end of the F-actin capping beta subunit region of CAPZB, and removal of 11 aa from the N-terminal Citron homology domain (CNH) of VPS39. In each case, except VSP39, altering the sequence has unknown consequences for protein function. VPS39 is a putative adaptor protein that displays downregulation of a cassette exon in hESCs relative to CPs. The CNH domain in this protein is required for the clustering and fusion of late endosomes and lysosomes
[56]. Interestingly, the TRAP-1 homologue, the isoform that lacks this exon, does not mediate lysosomal clustering. Rather, it specifically associates with the transforming growth factor β signaling pathway, suggesting that modification of the CNH domain can alter its signaling properties.
Missing Domain Annotations Affect the Sensitivity of AltAnalyze Prediction
At least two confirmed AS events (NUMB and MADD) had no difference in domain-level predictions, but did have known functional isoform differences associated with the AS events
[57]. NUMB or the
Drosophila orthologue NUMB is involved in early cell-fate decisions
[57] and MADD (MAP-kinase activating death domain) protein is a membrane-bound cytoplasmic adaptor protein that interacts with the TNF-α receptor 1 to transduce apoptotic signals
[58]. Both genes affect pathways of proliferation and apoptosis. The CP-enriched isoform of NUMB is antiproliferative, whereas the hESC-enriched form (p71), with a longer proline-rich region (PRR), retains its proliferative properties
[59],
[60]. Likewise, expression of the CP-enriched MADD isoform (IG20) promotes apoptosis, whereas the hESC-enriched isoform (DENN) is anti-apoptotic and is typically over-expressed in tumors. These missing annotations were likely due to either an absence of the domain annotation (PRR) or lack of an annotated domain/motif. These examples illustrate the dependence of AltAnalyze's domain/motif predictions on up-to-date and valid annotations.
Developmental Regulation of miRNA Binding Site Inclusion by AS
A number of recent studies demonstrated a critical connection between miRNA expression and the maintenance of pluripotency or the differentiation of cardiac cells from hESCs. In our exon-array gene expression analysis, genes for 26 miRNAs were up- or downregulated during differentiation to CPs and NPs, including previously implicated pluripotency (mir-302a, 302b)
[61] and cardiac (mir-133, 23b, 26a)
[44] regulated miRNAs (
Dataset S1). Each of these miRNAs was appropriately segregated by ANOVA pattern analysis ().
Although much effort has been devoted to defining the expression patterns and novel targets of miRNAs, little is known about the role of AS in miRNA binding site inclusion in processed mRNA transcripts. Traditional gene expression microarrays focus on the coding regions of transcripts and ignore the noncoding exons, which can be alternatively spliced to produce different C-terminal exons or 3′UTRs of different lengths. However, exon-tiling arrays allow us to assess mRNA features in tandem with existing predictions for miRNA binding site position on a global basis.
Our analysis identified 264 genes with putative miRNA binding sites that overlap with alternative exons, including those undergoing AS. We tested 10 of these alternative exons by RT-PCR and confirmed nine, including the SPTBN1 and ASPH variants described earlier. Putative miRNA binding sites were included or excluded as a result of alternative cassette exons (ASPH, SEPT6), alternative C-terminal exons (CDC42, C6orf134), bleeding exons (SPTBN1), intron retention (ATP2A2, DERP6), or 3′UTRs with a longer or shorter sequence (LEFTY1, MAFB) (). At least one of these alternative exons (MAFB), with predicted regulation of a mir-130a binding site, is a known target of the predicted miRNA
[62] (). In addition to MAFB, several of the putative regulated binding sites were independently predicted by multiple miRNA binding site algorithms (ATP2A2, C6orf134, CDC42, LEFTY1).
Examination of miRNAs with previously established hESC or cardiac differentiation expression patterns identified binding sites for mir-302a, 302c (ESC), and mir-26a (cardiac) in the alternative bleeding exon of SPTBN1 and the afore mentioned binding sites in the 3′UTR of ATP2A2. These data suggests a new, largely AS-dependent mechanism for miRNA regulation of such genes. Since miRNAs can promote and inhibit the translation of targets dependent on cell cycle stage
[63], there is the opportunity for complex modes of regulation by these predicted targets
in vivo. Future studies will be aided by a global profile of miRNAs expression in hESCs and CPs, to determine which miRNAs are most likely to target these alternative mRNAs.
Summary and Conclusions
Using high-density expression profiling, a new method for isolating cardiomyocytes, and novel bioinformatics methods (AltAnalyze), we characterized AS along distinct developmental pathways that influence both pluripotency and the commitment to cardiac and neural lineages. In addition to new insights into these processes, these results offer novel targets for driving the expression of pluripotent cells to distinct lineages and inducing pluripotency from adult cells at the level of specific splice isoforms.
We identified genes that undergo AS during differentiation and observed several global trends which suggest that functional elements, such as protein domains and miRNA binding sites, are coordinately regulated by AS. Many alternative exons highlighted in our analysis were predicted to disrupt or modify functionally important sequences, such as DNA-binding and protein kinase domains that are likely impact protein function. Several of our domain-level predictions also correlated with known changes in protein isoform function or expression as a result of AS
[36],
[41]–
[43]. Thus, AltAnalyze may be useful for identifying AS events that alter protein function/expression by disrupting or modifying protein domains, motifs, or miRNA binding sites.
We identified and confirmed many splicing events that occurred along pathways of apoptosis and proliferation. Two genes confirmed by RT-PCR encode the apoptosis activators PCBP4 and MADD. Isoforms for both genes that induce apoptosis, were downregulated in hESCs but not CPs. Conversely, the proliferation-promoting isoform of NUMB is expressed in hESCs but is undetectable in CPs, while the anti-proliferation isoform is upregulated in CPs, as shown by RT-PCR. These results suggest the intriguing possibility that splicing may coordinately alter the functional repertoire of distinct members of the same pathway to elicit a biological effect, in this case, self-renewal in hESCs. We also observed AS of the apoptotic regulators CSDE1 and UBE4B along with previously demonstrated tumor suppressor genes ANXA7
[40], EWSR1
[64], and PKM2
[41]. Since both PKM2 and the proto-oncogene EWSR1 directly interact with the pluripotency transcription factor OCT3/4 to promote OCT3/4 activity
[41],
[64], specific isoforms of these genes may also be critical in the regulation of hESC maintenance.
Although only one confirmed CP-specific AS event (ASPH) was clearly linked to the regulation of cardiac physiology, several other novel CP-specific AS events were predicted to alter the composition of critical protein domains (CAPZB, UBE4B, HIF3A, HDAC9). One of the most intriguing was AS of the cardiac inhibitor HDAC9, which results in a highly truncated or nonexpressed form specifically in CPs. These data further support a role for AS in the direct specification of cardiac precursors.
Finally, analysis of the overlap between predicted miRNA binding sites and alternative exons revealed a potential mechanism by which specific cell types may regulate miRNA activity independently of miRNA expression. Such regulation involves AS of exons and differential expression of distal terminal exons, where the mechanism regulating exon length is unclear. Two recent analyses have further demonstrated the interaction between miRNAs and alternatively spliced isoforms
[65] or UTRs of different length
[34]. Since miRNA expression is thought to fine-tune protein expression downstream of transcription, alternative exon inclusion may be a parallel means of regulating miRNA binding site selection while still retaining full-length protein expression.
While we present several new analyses in this study, it will be essential to experimentally validate these protein and miRNA-level predictions. Additional computational analyses, such as comparative genomic sequence analysis, will also be important for delineating common and distinct cis-regulatory sequences that may regulate cardiac and neuronal splicing. Further refinement of our algorithm to decrease false negatives, similar to other approaches
[45],
[66], will also be important in identifying additional AS events. Finally, future application and refinement of these analyses to additional cell lineages and time points may yield greater resolution of AS events that will likely provide insights into the mechanisms of cell-fate commitment and maintenance of hESC pluripotency.