Alternative splicing (AS) is an important cellular process that leads to multiple mRNA isoforms from a single pre-mRNA in eukaryotic organisms. Plant AS events used to be regarded as rare. However, a growing number of computational studies have now demonstrated that the frequency of alternatively spliced genes in plants is higher than previously estimated [
1,
2]. 20–30% of expressed genes are alternatively spliced in
Arabidopsis thaliana (
At) and rice (
Oryza sativa, Os) as revealed by large scale EST-genome alignments [
1,
2]. A recent study using EST pairs gapped alignments (EST-EST) surveyed 11 plant species and suggested that overall AS frequencies vary greatly in different plant species, with some rates comparable to those observed in animals [
3]. In mammals, exon skipping (ExonS) is the most common type of AS [
4,
5], but in
At and
Os, intron retention (IntronR) is most abundant [
1]. Alternative acceptor site (AltA) and alternative donor site (AltD) are also common in these two model plants [
1,
2]. A rare type of AS event is alternative position (AltP), where an alternative intron differs from its constitutive form in both donor and acceptor sites [
1]. Examples of all five types of AS events are shown in Additional file
1 (Supplementary Figure S1). Recently, a novel approach involving whole-genome microarray data revealed that IntronR can be detected in ~8% of
At genes [
6]. The prevalent IntronR events suggest that an intron recognition mechanism is predominant in
At and
Os [
1]. A small fraction of conserved AS events have also been discovered and confirmed between
At and
Os, strongly indicating the functional importance of AS in plants [
1].
Most computational studies on AS in mammals and plants use transcript sequences from the same species as their genome sequences. For species with relatively small EST/cDNA collections, transcript sequences from closely related species can be a valuable resource for identification of additional AS events. Even for species with large EST collections, including human and mouse, cross-species EST alignment have been used to reveal novel AS events. As many as 42% of human genes show novel AS patterns by aligning mouse transcripts to human genome [
7], and more than 10% of human loci exhibit conserved AS events in mouse [
8]. Another study applying the cross-species strategy to human, mouse and rat identified 758 novel cassette-on exons (ExonS) as well as 167 novel retained introns (IntronR). RT-PCR validated 50~80% of tested events, indicating the impressive potential of the cross-species method in identifying novel AS events [
9]. In plants, cross-species transcripts have been used mainly for gene annotation. For example, transcript assemblies from 185 species were mapped to the
Os genome, confirming about 90% of gene predictions plus about 500 novel genes [
10]. Similarly, approximately 850 novel genes and 1,000 novel AS events were annotated in
Os by aligning ESTs from seven plant species [
11]. The AS events supported by cross-species transcripts are likely to be functional, as they are conserved between species.
Experimental studies provide additional insight into the function of AS in plants. A wide range of plant genes with diverse functions are regulated through AS, including (but not limited to) genes involved in transcription, splicing, photosynthesis, disease resistance, stress, flowering and grain quality (reviewed in [
12,
13]). Genes involved in splicing, especially in splicing regulation, seem to have a higher frequency of AS [
14]. Several recent studies have revealed that serine/arginine-rich (SR) protein transcripts exhibit extensive levels of AS and that some AS pattern are conserved between
At and
Os [
15-
18]. Maize SR protein transcripts are also alternatively spliced [
19,
20]. Temperature stress (cold and heat) as well as hormone treatment can change the AS patterns of SR proteins in
At, suggesting an important role for AS in the stress response [
15]. One
At U2AF35 homolog (atU2AF35a) is alternatively spliced by removing non-canonical introns with repeated borders in the 3'-end of the coding region. Changing the expression of U2AF35 homologs alters the splicing pattern of the FCA gene and, in turn, causes variation in flowering time [
21]. The U1-70K gene encodes a core protein in U1 small nuclear ribonucleoproteins (snRNP). The sixth intron of U1-70K can be retained in
At [
22], an event conserved between
At and
Os [
1]. Recently, the IntronR event was experimentally confirmed in
Os and maize [
23].
Over 400 genes in 54 plant species are now known to be alternatively spliced [
24]. Only a few AS events, however, have been reported in legumes (
Fabaceae), one of the largest and most important plant families. In
Lotus japonicus (
Lj), a phytochelatin synthase gene (LjPCS2) can be alternatively spliced, with one isoform present in nodules (LjPCS2-7N) and another isoform in roots (LjPCS2-7R). The two isoforms encode proteins differing only in five amino acids, where one protein (LjPCS2-7N) confers cadmium (Cd) tolerance while the other does not, at least not when ectopically expressed in yeast cells [
25]. A nodule specific gene (LjNOD70) shows an IntronR event in
Lj, where the spliced isoform is less abundant in nodules [
26]. Six sucrose synthase genes exist in
At,
Os and
Lj, but only the
Lj homolog (LjSUS2) is alternatively spliced [
27]. In soybean (
Glycine max,
Gm), a nodule specific gene (GmPGN) has been identified through EST data mining. Experiments confirmed the tissue specificity and also revealed AS events for this gene [
28]. In kidney bean (
Phaseolus vulgaris), a single gene (PvSBE2) can be alternatively spliced to produce two starch-branching enzyme isoforms, each with distinct characteristics and subcellular localization [
29]. A highly abundant novel giant retroelement (
Orge) of pea (
Pisum sativum) is partially spliced, probably regulating the ratio of full-length protein, as the retained intron causes truncation [
30].
Two legume plants,
Medicago truncatula (
Mt) and
L. japonicus (
Lj), have large-scale genome sequencing projects in progress [
31]. In late 2006, the
Medicago genome sequence consortium (MGSC) constructed a partial genome assembly based on 1,996 Bacterial Artificial Chromosome (BAC) clone sequences as a basis for constructing draft pseudochromosomes. A total of 42,358 genes were annotated by the International
Medicago Genome Annotation Group (IMGAG) [
32], representing ~60% of all
Mt genes. The data has been released as Mt1.0, available at [
33]. In parallel,
Lj has 1,394 Transformation-competent Artificial Chromosomes (TACs) in GenBank (as of mid-2006), with 488 of them at phase 3 (finished). Both legume model plants have relatively large EST collections (over 150,000 sequences). There are also large numbers of transcript sequences from other legume species, especially soybean. These features make
Mt and
Lj ideal for computational comparison of AS events in legume and other plants.
In this study, all available transcript sequences from legumes were aligned to Mt and Lj BAC/TAC sequences. At and Os transcript sequences were also aligned to their own genome sequences for comparison purpose. The frequency of alternatively spliced genes is very similar across the different plant species as long as the number of ESTs used as a basis for analysis is standardized across different species. In the case of Mt, about 10% of expressed genes are alternatively spliced at current EST coverage, with IntronR the most abundant type. Novel and conserved AS events can be identified if cross-species ESTs are aligned to the genome. These results provide a basis for analyzing AS events conserved in all plants as well as those found in legumes only. This is the first large-scale analysis of AS using EST-genome alignments in plants other than At and Os, and it is also the first detailed comparison using cross-species transcript sequences in plants.