PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of ploscompComputational BiologyView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
 
PLoS Comput Biol. 2012 November; 8(11): e1002747.
Published online 2012 November 1. doi:  10.1371/journal.pcbi.1002747
PMCID: PMC3486868

Theory on the Coupled Stochastic Dynamics of Transcription and Splice-Site Recognition

Rajamanickam Murugan 1 , 2 and Gabriel Kreiman 2 , 3 , 4 , *
Roderic Guigo, Editor

Abstract

Eukaryotic genes are typically split into exons that need to be spliced together to form the mature mRNA. The splicing process depends on the dynamics and interactions among transcription by the RNA polymerase II complex (RNAPII) and the spliceosomal complex consisting of multiple small nuclear ribonucleo proteins (snRNPs). Here we propose a biophysically plausible initial theory of splicing that aims to explain the effects of the stochastic dynamics of snRNPs on the splicing patterns of eukaryotic genes. We consider two different ways to model the dynamics of snRNPs: pure three-dimensional diffusion and a combination of three- and one-dimensional diffusion along the emerging pre-mRNA. Our theoretical analysis shows that there exists an optimum position of the splice sites on the growing pre-mRNA at which the time required for snRNPs to find the 5′ donor site is minimized. The minimization of the overall search time is achieved mainly via the increase in non-specific interactions between the snRNPs and the growing pre-mRNA. The theory further predicts that there exists an optimum transcript length that maximizes the probabilities for exons to interact with the snRNPs. We evaluate these theoretical predictions by considering human and mouse exon microarray data as well as RNAseq data from multiple different tissues. We observe that there is a broad optimum position of splice sites on the growing pre-mRNA and an optimum transcript length, which are roughly consistent with the theoretical predictions. The theoretical and experimental analyses suggest that there is a strong interaction between the dynamics of RNAPII and the stochastic nature of snRNP search for 5′ donor splicing sites.

Author Summary

The DNA encoding most eukaryotic genes is interrupted by long sequences called introns. These introns need to be removed through the process of splicing to produce the mature messenger RNA. The process of splicing plays a critical role in determining the exact aminoacid content of the ensuing protein. Several molecules denominated small nuclear ribonucleo proteins (snRNPs) are involved in finding the appropriate 5′ donor splicing sites for splicing. Transcription and splicing occur simultaneously and the ultimate product depends on the relative speed of transcription and the stochastic dynamics underlying splicing. Here we propose a biophysically plausible theory that describes the ongoing interactions between transcription and splicing. We show that the theoretical predictions are consistent with experimental measurements of the abundance patterns of different exons and transcripts across tissues.

Introduction

Transcription of eukaryotic genes by the RNA polymerase II complex (RNAPII) produces a primary mRNA transcript (pre-mRNA) that contains both exons and introns. Introns are removed by splicing [1], [2], [3] via the assembly of a spliceosomal complex including small nuclear ribonucleo proteins (snRNPs) [4], [5], [6], [7]. Recent studies show that the majority of genes in higher eukaryotes are alternatively spliced and, therefore, contribute significantly to the structural as well as functional complexity and diversity of organisms [8], [9], [10]. The process of splicing can start as soon as the pre-mRNA begins to emerge from RNAPII. Cis-acting regulatory elements such as splicing enhancers and silencers generally determine the splicing pattern of a given multi-exonic gene especially when transcription is not kinetically coupled to the splicing [11], [12], [13], [14]. However, when transcription is coupled to splicing, inclusion or exclusion of an exon in the final transcript will also be strongly influenced by the transcription elongation rate as well as the local concentrations of various factors involved in the spliceosomal assembly and their interactions [15], [16], [17], [18].

Two basic models have been proposed to explain the various differences in the alternative splicing patterns of a given gene. According to the kinetic model [19], inclusion or exclusion of an exon in the final transcript is determined by the transcriptional elongation rate associated with the corresponding pre-mRNA in addition to the cis-acting regulatory elements. Exons are classified as ‘strong’ or ‘weak’ depending on whether they possess cis-acting regulatory elements associated with them or not. The inclusion of ‘strong’ exons is favored at higher transcriptional elongation rates whereas ‘weak’ exons may be included in the final transcript only when the transcriptional elongation rate is comparatively slower. Since the concentration of snRNPs in the vicinity of the transcriptional machinery is fixed under steady state conditions, a strong exon that has emerged recently from the transcriptional assembly will have a better chance of interacting with the snRNPs as compared to a weak exon that emerged earlier. Therefore, a weak exon will have a better chance to interact with the snRNPs only when there is a decrease in the rate or a pause in the transcriptional elongation process. According to the recruitment model [20], inclusion or exclusion of an exon is also decided by the interaction of the C-terminal domain (CTD) of RNAPII with a set of gene and exon specific DNA binding proteins and the snRNPs [19], [20] in addition to cis-acting regulatory elements. The CTD of the RNAPII interacts directly with the snRNPs and other factors, increasing the local concentrations of these factors in the vicinity of the emergence of a weak exon and thus enhancing the probability of weak exons to interact with the snRNPs.

There are four basic variables involved in the definition of an exon: (1) cis-acting regulatory elements [11], [12], [13] (2) transcription elongation rate [19] (3) interactions between the CTD of RNAPII and the snRNPs, hnRNPs and SR proteins [19], [20] (often referred to as ‘recruitment’) and (4) the stochastic dynamics involved in the recognition of the 5′ donor splice sites by U1 snRNPs while the pre-mRNA is evolving from the transcription assembly. Variables 1 and 3 are specific to each exon whereas variables 2 and 4 are generic and affect all the exons across various transcripts of an organism.

Most of the current splice pattern prediction algorithms consider mainly the cis-acting regulatory elements (variable 1) [21], [22], [23], the kinetic model focuses on variable 2 [19] and the recruitment model considers mainly variable 3 [19], [20]. None of the current algorithms or models considers the stochastic dynamics associated with the snRNP search process (variable 4). Here we propose a biophysically plausible theory from first principles to describe the coupled dynamics of transcription and splicing. This work presents initial steps towards capturing the basic relationship between transcriptional elongation and splicing; the simplified model that we propose does not include multiple critical components that affect the splicing outcome including cis-acting pre-mRNA sequence motifs, trans-acting interactions with different proteins and variable rates of RNAPolII transcription. We focus on the stochastic dynamics whereby snRNPs locate the 5′ donor sites and how this search influences the outcome of splicing. We evaluate the theoretical predictions by analyzing expression data at the exon level from exon microarrays and RNAseq experiments across different tissues in mice and humans.

Results

A theoretical framework of coupled transcription and splicing

Recent single cell studies have revealed [24], [25], [26] that small nuclear ribonucleoproteins (snRNPs) and other splicing proteins are freely diffusing inside the entire volume of various nuclear and splicing factor compartments of within the eukaryotic cell nucleus. Splicing is kinetically coupled to transcription when the time required to generate a complete transcript is longer than the time required for the assembly and catalytic activity of the spliceosomal proteins. Under such coupled conditions, we must simultaneously consider at least two different types of dynamical processes: (i) transcription elongation by the RNA polymerase II transcription complex (RNAPII) and (ii) the search process whereby snRNPs locate the 5′ donor splicing sites (DSS) on the emerging pre-mRNA to initiate the spliceosomal assembly ( Figure 1 ). The freely diffusing U1 snRNP can locate the donor splicing sites via two different types of mechanisms: a pure three-dimensional diffusion-controlled collision route (3D) and a combination of three-dimensional and one-dimensional diffusion dynamics as in the case of typical site-specific DNA-protein interactions (3D+1D) [27], [28], [29], [30]. Upon successful binding of the U1snRNP molecule to the 5′ donor site, a cascade of molecular processes involving multiple snRNPs ensues, culminating in the formation of the spliceosomal complex and intron removal [1], [2], [3]. Except for the binding of U1 snRNPs at the 5′ donor site, all the other steps involve the hydrolysis of ATPs. This means that the binding of U1 is a purely thermally driven process and here we focus on the dynamics involved in this rate-limiting step. All the other binding events and reactions, including transcription elongation, involve ATP hydrolysis and we therefore assume that the effects of thermal induced fluctuations are minimal in these reaction steps. We ignore the thermal induced fluctuations over these reaction steps while describing the search dynamics of snRNPs along the pre-mRNA. The overall probabilities associated with the interaction of snRNPs with various DSSs depend on the type of search mechanism followed by the snRNPs.

Figure 1
Schematic description of the various simultaneous processes that take place when splicing is coupled to transcription.

We start by considering the model illustrated in Figure 1 where the U1 snRNP has bound the emerging pre-mRNA via non-specific interactions facilitated by 3D diffusion and it scans the concomitantly emerging pre-mRNA for the presence of DSSs via 1D diffusion. At a given time t, let y(t) denote the length of the emerging pre-mRNA and let x(t) denote the position of the non-specific bound U1 snRNP on the pre-mRNA chain. The DSS under consideration is located at position x = n (DSSn), which has not been transcribed at time t (or is currently not reachable by the snRNP due to steric hindrance). Such coupled dynamics of snRNPs and RNAPII, represented by the set of dynamic position variables x and y (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e002.jpg) on the same pre-mRNA, can be described by the following set of Langevin type stochastic differential equations [31]:

equation image
(1)

The transcription elongation rate is denoted as kE (bases s−1). xd (bases2s−1) is the 1D diffusion coefficient associated with the searching dynamics of U1 snRNPs towards the DSSn and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e004.jpg is the delta-correlated Gaussian white noise with An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e005.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e006.jpg. The movement of RNAPII along y is energetically driven via the hydrolysis of ATPs. As a result, the fluctuations in y are negligible and we use a deterministic description for RNAPII in Eq. 1 .

Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e007.jpg denote the joint probability of finding the snRNPs at position x and RNAPII at position y at time t given initial conditions x0, y0. The Fokker-Planck equation associated with the temporal evolution of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e008.jpg can be written as follows [31]:

equation image
(2)

Here the initial condition is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e010.jpg, ensuring that at time t0, the probability of finding x0 = 0, y0 = 0 is normalized to one. The boundary conditions are as follows:

equation image
(2′)

Here x = 0 as well as x = y (y<n) act as reflecting boundary conditions for the dynamics of snRNP. Whenever the snRNP tries to visit x≤0 or xy it is reflected back into x An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e012.jpg[0, y]. Here An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e013.jpg acts as absorbing boundary condition whenever An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e014.jpg.

Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e015.jpg indicate the probability that RNAPII and snRNP are between position 0 and n at time t (given starting points x0, y0). Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e016.jpg denote the mean first passage time (MFPT) associated with the binding of snRNP at DSSn starting from initial conditions (x0, y0). From the definition of MFPT, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e017.jpg. Noting that before time n/kE, the DSSn has not emerged yet, we have:

equation image

and therefore An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e019.jpg obeys the following backward type Fokker-Planck equation [31]:

equation image
(3)

with the following boundary conditions:

equation image
(3′)

We assume that the residence time associated with dissociation of the non-specific bound snRNPs from the pre-mRNA is much higher than the time required by the snRNPs to locate the 5′ donor splicing sites. As a result, we have introduced a reflecting boundary condition at x = 0 in the first boundary condition. The other boundary conditions can be directly derived from Eq. 2′ . The second boundary condition describes the conditions where RNAPII transcription elongation is the limiting step and the third boundary condition describes the conditions where snRNP diffusion is the limiting step. The particular solution to Eq. 3 for the boundary conditions in Eqns 3′ can be written as follows:

equation image
(4)

Considering x0 = 0 and y0 = 0 (both RNAPII and snRNP start at the origin), we have An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e023.jpg. The first term is the time required to generate a pre-mRNA of n bases and the second term is the time required by the snRNPs to completely scan this pre-mRNA length via 1D diffusion. The validity of this equation for the MFPT under various values of n and kE is illustrated in Figure 2A–B using random walk simulations.

Figure 2
A–B. Validation of the expression for the mean first passage time (MFPT, in seconds) given by Eq. 4 (blue) using random walk simulations (red) at different elongation rates kE (A) and different positions of the absorbing boundary n (B).

In line with site-specific DNA-protein interactions [27][30], we assume that snRNP molecules locate their respective DSS binding sites on the growing pre-mRNA via a combination of 1D and 3D diffusion-controlled collision routes. Under such conditions, from Eq. 4 we find the average overall search time (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e028.jpg) required by the snRNPs to locate DSSn (x0 = 0;y0 = 0):

equation image
(5)

Here An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e030.jpg (units of seconds) is the 3D diffusion-controlled collision time required for non-specific binding of U1 snRNP with the pre-mRNA of length n. Eq. 5 suggests that there exists an optimum position of DSSn on the emerging pre-mRNA such that the search time required by the snRNPs to locate this DSSn will be a minimum. This optimum value can be obtained by solving An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e031.jpg for n. The explicit real solution of the resulting cubic equation is:

equation image
(6)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e033.jpg. Upon substituting nopt in Eq. 5 we find the minimum search time An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e034.jpg.

In line with the prediction of the kinetic model, when the snRNPs locate the DSSn via a purely 3D diffusion-controlled collision route, the overall search time is:

equation image
(7)

In this equation, c (units of bases) is the sequence length within which the snRNPs can be captured at the 5′ donor site. A precise and tight binding would correspond to c = 1. Upon comparing this expression with Eq. 5 we find that there exists a critical position on the pre-mRNA (nc) such that τS,1D3D = τS,3D. Solving the cubic equation An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e036.jpg for n ( Figure 2C ):

equation image
(8)

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e038.jpg.

While deriving Eq. 5 we have assumed that the non-specific bound snRNP does not dissociate from the pre-mRNA chain until it reaches DSSn. We relax this assumption by modeling the search dynamics of snRNPs as multiple cycles of dissociation-scan-association events. In this modified version of the model, the non-specific bound snRNP can dissociate after scanning an average pre-mRNA length of L bases and then it re-associates back at the same or different location of the pre-mRNA chain. In this way, snRNPs are required to undergo at least (n/L) such association/dissociation events to scan the entire length of n bases. Under such conditions, the expression for the overall search time (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e039.jpg) can be written as follows:

equation image
(9)

Here L2/6xd is the average time required by the non-specific bound snRNPs to scan an average of L bases of pre-mRNA before the dissociation event. The scan length L depends on the magnitude of the interaction between the snRNPs and the pre-mRNA. When L = n, Eq. 9 reduces to Eq. 5 . When An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e041.jpg, there exists an optimum value of L in Eq. 9 at which An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e042.jpg is a minimum: An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e043.jpg. The corresponding minimum achievable search time is:

equation image
(10)

One should note that the optimum 1D scanning length can be achieved by the diffusing U1 snRNPs only when the inequality condition An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e045.jpg holds since by definition An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e046.jpg. Further analysis shows that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e047.jpg will reach a minimum only when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e048.jpg. Upon comparing Eqns 5 , 7 and 9 we find that when n<nc, then both An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e049.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e050.jpg will be lower than An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e051.jpg. In the range L An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e052.jpg(0, nopt) the cubic equation An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e053.jpg has two real solutions for n (n1~L and n2, marked in Figure 2C ) for n. When n An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e054.jpg(L, n2), we find that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e055.jpg. The relationship among these different search times is shown in Figure 2C . These results suggests that among the three possible modes of searching (pure 3D, 1D3D with multiple dissociations and 1D3D without dissociation), the 1D3D search mode of search without any dissociation event will be the most efficient and preferable one in the range n An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e056.jpg(L, n2) where L is the possible 1D scanning length associated with diffusion of U1 snRNPs along the emerging pre-mRNAs. We find from Eqs. 9 10 that similar to the pure 3D diffusion mediated search time (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e057.jpg), An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e058.jpg is also a monotonically increasing function of n. On the macroscopic level, the interactions of snRNPs with DSSn can be described by the following chemical reaction scheme I:

equation image
(Scheme I)

Here An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e060.jpg (bases−1s−1) is the bimolecular type forward on-rate constant associated with the site-specific interaction of snRNP with the DSSn and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e061.jpg (s−1) is the respective dissociation or off-rate constant. The sequence of DSSn plays critical role in determining the value of the off-rate. The number of snRNPs will be higher than the number of DSSs of a particular pre-mRNA transcript. In this situation, the thermodynamic probability of finding DSSn (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e062.jpg) to be bound with snRNPs is:

equation image
(11)

Here N0 is the total number of the freely diffusing snRNPs inside the nucleus. It follows from Eqns 5 6 that the probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e064.jpg is maximized when n = nopt irrespective of the value of the intra nuclear concentrations of snRNPs or the amount of time for which the completely transcribed pre-mRNA chain stays inside the nuclear compartment for further post-transcriptional processing. On the other hand, when the snRNP search mode is purely via 3D routes then the probability (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e065.jpg) is a monotonically decreasing function of n ( Figure 2D ):

equation image
(12)

From Eqs 11 12 , we find An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e067.jpg (all DSSn bound by the snRNP given infinite concentration). Those splicing sites located closer to the optimum position (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e068.jpg) approach this limit faster. Using Eq 11 we define the overall splicing efficiency of a transcript of length n as follows:

equation image
(13)

The value of the splicing efficiency An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e070.jpg (between 0 and 100%) indicates how well exons present in a given pre-mRNA transcript of length n interact with the available pool of snRNPs, are subsequently spliced and hence get included in the final transcript. This means that the overall levels of the final transcript should be directly proportional to this splicing efficiency. There exists an optimum length of pre-mRNA transcript (μ) at which An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e071.jpg achieves a maximum. The optimum μ can be obtained by numerical solving An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e072.jpg for n. The overall level of the final transcript will be maximum at An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e073.jpg since the overall average probabilities associated with all those exons of the given pre-mRNA transcript of length μ to interact with the available snRNPs will be a maximum. We consider a transcript c of length n and its expression in tissue k. We define the overall signal as An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e074.jpg where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e075.jpg is the signal from the exon located at position i in transcript c in tissue k. With this definition we find that the maximum gene signal value of n occurs at An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e076.jpg which means that when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e077.jpg the equality An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e078.jpg holds. This follows from the fact that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e079.jpg.

Comparison with experimental data

We compare the theoretical predictions outlined in the previous section with two different types of experimental measurements: (i) experiments based on exon microarray data and (ii) experiments based on high-throughput RNA sequencing data (RNAseq) (“Materials and Methods”). Upon substituting the parameters τt, kE and xd into Eq. 6 for the optimum position of the DSS on the pre-mRNA we find An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e080.jpg bases and the minimum achievable overall search time required by the snRNPs An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e081.jpg. This search time is significantly higher than physiologically relevant timescales (for example, the cell's generation time). One should note that this higher timescale corresponds to the interaction of a single snRNP molecule with a single splicing site. The search time will be proportionately scaled up/down depending on the number of freely available snRNPs and nascent splicing sites inside the nucleus as An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e082.jpg. There are ~2×104 genes in the human genome, and there are on average ~10 exons per gene. This means that there are d0~4×103 such splicing sites at any given active region of the chromosome (corresponding to ~1% of the total pre-mRNAs being processed). With these values we find An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e083.jpg. These results suggest that the appearance of the speckles where snRNPs are concentrated inside the nucleoplasm of higher eukaryotes is mainly to scale down the search time required by snRNPs to locate the splicing-sites on the pre-mRNA.

We conclude from the expression for the probability of finding the snRNP at position n (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e084.jpg, Eq. 11 ) that the DSS located at position An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e085.jpg of the growing pre-mRNA will have more chances to interact with the available snRNPs. Here the minimization of the overall search time An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e086.jpg is achieved mainly via the enhancing effects of the increasing numbers of non-specific interactions of snRNPs with the growing pre-mRNA. We learn from Eq. 8 that the inequality condition An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e087.jpg will hold whenever An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e088.jpg. The current parameter settings yield An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e089.jpg bases. Various single-cell studies using fluorescence recovery after photo bleaching (FRAP) provide an empirical estimate for the dissociation rate of snRNPs from the pre-mRNA chain: An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e090.jpg [24], [25], [26]. This is an overall off-rate that includes dissociation of snRNPs from both the non-specific and specific binding sites (the off-rate of snRNPs from the splicing sites will be lower than the off-rate from non-specific binding sites.) Using this value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e091.jpg, the limiting behavior of pn,1D3D and pn,3D as An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e092.jpg is demonstrated in Figure 2D . This figure suggests that the optimum position of DSS will spread into a wider range as the total concentration of snRNPs increases inside the nucleoplasm. Single molecule studies suggest an average 1D scanning length of L~100 bases for the DNA-binding proteins under in vivo conditions [32]. With this value, upon solving the cubic equation An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e093.jpg for n we find that n1 = 100 and n2 = 2×106 bases. Since within this range An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e094.jpg, this result suggests that the dominating mode of searching of U1 snRNPs for the 5′ splicing sites is likely to be via the combination of 1D and 3D without dissociation for most of the pre-mRNAs.

We considered microarray data evaluating exon levels in different tissues and species (Materials and Methods.) Examples of mouse and human constitutively spliced multi-exonic genes across various tissues are shown in Figure 3A–B . These examples, identified using the ranking metric defined in Eq. 14 , suggest that there exists a broad optimum position of splicing sites on the pre-mRNA at which the probability associated with the inclusion of the associated exon is maximized. This position is approximately independent of the tissue analyzed. In these particular mouse and human genes (Dtnb dystrobrevin beta in mouse and VIT vitrin in human), this optimum exon number occurs at the pre-mRNA position of n~5×104 to 105 bases (arrow in Figure 3A–B ). Other examples are included in supplementary materials (Figure S1, S2). The position of the maximum splicing index value, independently of the tissue, occurs around nopt~7×104 bases as predicted by Eq. 6 , with an error margin of ~25%.

Figure 3
A. Example showing the splicing index (σε) as a function of the annotated exon number ε in mouse gene Dtnb (dystrobrevin beta, NM_007886, Affymetrix Transcript ID: 6792942).

Overall analysis of the multi-exonic genes present in both human and mouse genomes revealed an average intron length of ~4×103 bases with a median of ~103 bases. Here the average length of exons is ~2×102 bases with a median of ~102 bases. Results of genome wide analysis of the median of exon positions on pre-mRNAs of human and mouse is shown in Figure 3C–D which reveals the following approximate scaling relationships between the positions (n) and the exon numbers (ε):

equation image

The standard error (SE) in such transformation is approximately 5 to 25% of the mean (n) for ε in the range 1 to 100 ( Figure 3C–D ). This suggests that the optimum positions nopt and min τ S,1D3D may be observed anywhere in the ±25% of the predicted values upon a genome wide averaging across exon numbers ε.

The computed first exon normalized average signal (FENAS, defined in Eq. 15 ) associated with various mouse tissues (kidney, brain, liver, muscle and heart) and human tissues (cerebellum, kidney, liver, heart, muscle and normal and cancerous colon) is shown in Figure 4A–B . This figure indicates a maximum at approximately An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e099.jpg. This value corresponds to the optimum position of the Affymetrix annotated exon on the pre-mRNA at An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e100.jpg bases, which is broadly consistent with our theoretical predictions. We also compared the theoretical predictions with experimental data obtained from RNAseq experiments (Materials and Methods). The data from the exon level and transcript level signals obtained from RNASeq data of mouse brain and human T293 cells are shown in Figure 4C–D . The results from the RNASeq data are comparable to those from the microarray data and also reflect an optimum exon position, approximately around An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e101.jpg.

Figure 4
A–B. First exon normalized average signal for exon ε and tissue k (fε,k, FENAS measured as defined in Eq. 15).

Upon substituting An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e102.jpg molecules, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e103.jpg s−1 and the empirical values of τt, kE and xd into Eq. 13 and numerically solving it for the optimum transcript length n = μ we find An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e104.jpg bases ( Figure 5 ). This value corresponds to approximately An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e105.jpg exons. From the theoretical analysis, we learn that the overall transcript signal of a given gene is maximized when the number of exons present in that gene is closer to this value. We find from Figure 5 that the splicing efficiency is >95% whenever the length of the pre-mRNA transcript falls inside the range of ~(102–107) bases. The distribution of transcript lengths both in humans and mouse is well within this broad range. Furthermore, we calculated the genome level averaged transcript signal across various mouse and human tissues using Eq. 16 . Figure 6 suggests that there is a broad maximum in the transcript signal approximately centered around An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e106.jpg both based on the microarray data ( Figure 6A–B ) as well as the RNAseq data ( Figure 6C–D ). Within the expected error range of ±25%, these distributions and the location of the maxima are consistent with the theoretical predictions.

Figure 5
Overall splicing efficiency Ss,n as a function of the transcript length n as defined in Eq. 13.
Figure 6
A–B. Genome-wide normalized average level of transcripts with m exons in the kth tissue (hm,k, Eq. 16) in mouse (A) and human (B).

To further evaluate whether the experimental data are consistent with the existence of optimal exon positions, we computed the distribution of FENAS values for two separate broad ranges: (1) An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e113.jpg (i.e. around the theoretical optimum) and (2) An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e114.jpg or An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e115.jpg (i.e. far from the theoretical optimum). The distributions of FENAS signals were significantly different for these two ranges (t-test, p<0.05, Figure 7 ).

Figure 7
A. Distribution of transcript lengths based on the annotations (Materials and Methods).

Discussion

While the RNA polymerase II complex (RNAPII) is producing the pre-mRNA, multiple splicing factors diffuse inside the nucleus and initiate the recognition steps required in the process of splicing. Therefore, the ultimate mature mRNA product depends on several variables that affect the kinetics of these chemical and diffusion processes. These variables include RNAPII elongation speed and the presence of pausing events during transcription, the steric availability of splicing signals along the emerging pre-mRNA, exon and intron lengths, the abundance of different splicing factors and the sequence and hence affinity of those sequences for the splicing factors. Here we develop a simple theoretical framework that aims to capture the key interactions between transcriptional elongation and splicing.

The biophysical model proposed here can explain the effects of the stochastic search dynamics of small nuclear ribonucleo proteins (snRNPs) on the splicing pattern of eukaryotic genes. We considered two different ways to model the dynamics of snRNPs in the process of locating the splicing sites on the concomitantly evolving pre-mRNA: a pure three-dimensional diffusion process and a combination of three- and one-dimensional diffusion along the pre-mRNA. Our theoretical analysis on the coupled dynamics of transcription elongation and splicing revealed that there exists an optimum position of the splice sites on the growing pre-mRNA at which the time for snRNP binding is minimized ( Figure 2 ). The minimization of the overall search-time is achieved mainly via increasing non-specific type interactions between the RNA binding domains of snRNPs and the pre-mRNA. The theory further revealed that there is an optimum transcript length that maximizes the sum of the probabilities for the exons in the transcript to interact with the snRNPs. This suggested that the overall transcript signal should be maximized at this transcript length.

We evaluated the theoretical predictions by analyzing exon microarray data from various mouse and human tissues ( Figures 3 6 ). The empirical data revealed that the optimum position of the splice sites on the growing pre-mRNA occurs at ~4.5×104 bases and the optimum length of the transcript occurs at ~7.5×104 bases (corresponding approximately to the ~11th and ~20th exon in the genome wide first exon normalized average signal space.) The empirical data are broadly consistent with the theoretical predictions and the model captures, to a first approximation, some of the variability in exon level signals and splicing patterns.

Several computational algorithms have been developed to attempt to predict splicing patterns from DNA sequence. Most of the current splicing pattern prediction algorithms are solely based on cis-acting regulatory elements [21], [22], [23]. Typically each exon of a given pre-mRNA transcript is assigned a score depending on the presence or absence of exonic and intronic enhancer or silencer elements and their degree of conservation across different species [31]:. Using these exon level scores, transcript level scores are computed. Our work points out that, before computing the exonic scores for the presence of cis-acting elements, the ‘backbone’ of the scoring scheme assumes that all the exons are probabilistically equivalent. This uniform distribution of exon probabilities may hold only when the snRNP search mode is via pure 3D diffusion ( Figure 2D ) or the nuclear concentration of snRNPs is infinite. In more general scenarios, instead of a uniform distribution, our theoretical model suggests that the backbone of the scoring scheme should be given by the probability functional as defined in Eq. 12 13 . In other words, the backbone of the scoring scheme is determined by the generic variables 2 (transcription elongation rate), 3 (interactions between RNAPII and snRNPs) and 4 (stochastic dynamics of snRNP search processes) as highlighted in the introduction. The model suggests that a modified scoring scheme would include the background model that accounts for the coupled kinetics of transcription and splicing in addition to the exonic scores for the presence of cis-acting regulatory elements.

The theoretical framework presented here provides initial steps to describe the coupled chemical and diffusion process that underlie transcription and splicing. While we focused here on generic variables that affect all transcripts and genes, a lot of the transcript-to-transcript and gene-to-gene variability depends on sequence specific factors, gene-specific transcription pausing events, regulation of transcriptional termination and the speed at which the mRNA is transported to the cytoplasm. The theory proposed here constitutes a starting point to build more sophisticated models that further incorporate important aspects of the biology that were not considered in this initial examination.

Materials and Methods

Datasets

To compare our theoretical predictions with experimental observations, we considered two different types of publicly available data: (i) exon microarray data and (ii) RNAseq data.

Exon microarray data

We analyzed mouse and human exon microarray data collected using Affymetrix arrays [33], [34]. We used exon level signal data collected in triplicate from five different mouse tissues (brain, kidney, muscle, liver and heart; mouse Mo-Ex 1.0) and five different human tissues (cerebellum, kidney, muscle, liver, heart; human Hu-Ex 1.0). We also considered the available sample microarray data from normal and cancerous human colon [33], [34].

RNAseq data

We analyzed BOWTIE generated RNASeq datasets [35], [36]. The data sets come from mouse brain (GSM672532, GSM672537, GSM672528, GSM672534 and GSM672547), and human 293T cells (GSM860026, GSM860020, GSM860017, GSM860001 and GSM9685994). The mouse annotations are based on the mm8 genome build and the human annotations are based on the hg18 genome build and the data were obtained from the GEO database [37], [38]. We used the information on sequence type annotation, sequence, and genomic alignment from the GEO files.

Preprocessing of raw data

Experimental artifacts are introduced in the exon microarray data by factors such as cross-hybridizing probes, signal heterogeneity due to variation in the base composition of probes and signal variation due to fluctuations in the spot size of probes during microarray design. The cross-hybridization problem was solved by removing those probes showing hybridization at more than one location. Since the variations in probe level signals due to base composition, spot size and RT reaction are approximately random in nature, we assume that these errors are ameliorated by averaging over the scale normalized and background subtracted probe level signals of a probe set id, exon cluster id or transcript cluster id..

Exon level analysis

Exon level signals are computed by averaging the probe-set id level signals contained in an exon-cluster id and transcript level signals are computed by averaging the exon level signals contained in a transcript cluster id. Only the Refseq annotated transcript cluster ids were considered for all the subsequent calculations. We used the standard Tukey biweight algorithm [39] to remove the outlier probe signals before computing the average. We considered multiple transcripts (indexed by c) and different tissues (indexed by k). Let sε,c,k denote the log2 of the expression level of the εth exon in transcript number c and tissue number k. The relative probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e119.jpg associated with the ε th exon to get included in the final transcript was defined as An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e120.jpg where mc is the total number of exons in transcript c. The probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e121.jpg is directly related to the splicing-index (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e122.jpg) of the associated exon which is a measure of the extent of alternative splicing in that transcript, defined as An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e123.jpg where gc,k is the overall level of transcript c in tissue k. In addition to the stochastic component, other splicing variables such as the presence of cis-acting regulatory elements including splicing enhancers and suppressors can significantly modify the probabilities defined here.

To evaluate the expression derived in Eqns (11 12) we need a splicing probability profile of a pre-mRNA transcript that contains multiple exons spliced in a ‘constitutive’ manner across various tissues. Here we use the term ‘constitutive splicing’ to indicate the splicing pattern of a given pre-mRNA that is conserved across various tissues in a given organism. We use the following variance-based scoring metric to rank and select such constitutive transcripts from the pool of multi-exonic pre-mRNAs of a given genome:

equation image
(14)

We ranked the transcripts based on An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e125.jpg and we considered the top 25 transcripts to evaluate the theoretical predictions (these 25 transcripts represent the ones with minimal variation in the splicing index across different tissues as defined by the index An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e126.jpg). For a single-exon transcript, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e127.jpg. Earlier studies show that the majority of multi-exonic pre-mRNAs are spliced alternatively [21], [23]. This suggests that the number of constitutively spliced examples available to evaluate our model is limited.

We assume that the effects of cis-acting elements associated with a given exon number of various genes across the genome is approximately a symmetric random variable. That is, we assume that both the cis-acting enhancers as well as silencer elements are found on the genome with equal probabilities. Under this assumption, we expect that averaging over the first exon normalized signals (FENAS) of a given exon number across all the available multi exonic genes in the entire genome of an organism will essentially reduce up- and down-regulatory effects of the cis-acting elements apart from a local normalization of the exon signals within a gene. While carrying out this averaging process, the start and stop positions of each ε th exon of the pre-mRNA of different gene transcripts is also averaged out in such a way that in the overall averaged signal space the exons of average length are equally separated or flanked by the average length of introns of the genome. We define the FENAS metric as follows:

equation image
(15)

Here An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e129.jpg is the genome level FENAS (±%) of the ε th exon in tissue k. To compare Eq. (15) with Eqns (11 12) , we use the genome-wide scaling An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e130.jpg, that is, the position of DSSn is a function of the exon number ε (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e131.jpg). We note that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e132.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e133.jpg. To evaluate Eq. (11 12) , the average signals associated with the final transcripts with various numbers of exons at the genome level were calculated as follows:

equation image
(16)

Here An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e135.jpg is the genome level average signal of those transcripts with m exons in the k th tissue; b(m) is the total number of transcripts with m exons.

Analysis of RNASeq data

Exon microarrays possess very few probe sets per exon cluster id. Therefore, we also analyzed the number of sequence reads from RNASeq data (see datasets above). For this purpose we considered the start and end position of each transcript and exon and summed over the number of reads from RNASeq data. These signal profiles were used to compute the first exon normalized average signals FENAS as described in Eqn 15 . To compute the transcript level signal we considered the start and stop position of each transcript and summed over the number of reads from RNASeq data within this range.

Parameter estimation from experimental data

In order to compare the theoretical predictions with experimental measurements we estimate the kinetic and diffusion parameters required to quantitatively evaluate the theoretical equations from experimental studies. Single molecule data from the human U2OS osteosarcoma cell line shows an in vivo transcription elongation rate for RNAPII of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e136.jpg bases s−1 [40]. Single cell studies on BAC HeLa and E3 U2OS cell lines suggest that the overall diffusion coefficient for the U1-70K snRNP inside the nuclear splicing region is on the order of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e137.jpg µm2/s (~8×106 bases−2s−1) [24], [25], [26]. This value is close to the 3D diffusion coefficient associated with the dynamics of protein molecules inside the cytoplasm of prokaryotic systems [32]. The 1D diffusion coefficient associated with the diffusion dynamics of snRNPs on the pre-mRNA chain is not clearly known. Single molecule studies in E. coli [40] showed a numerical value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e138.jpg bases2s (~0.092 µm2/s) for the 1D diffusion coefficient associated with the dynamics of transcription factors along the DNA. This value is approximately 10 times smaller than the experimentally observed overall diffusion coefficient of U1 snRNP inside the nucleus. The experimentally observed fast diffusion coefficient can be attributed to the more flexible nature of single stranded pre-mRNAs compared to the double stranded DNA chain. The nuclear diameter of a typical human cell is ~6 µm and the corresponding volume will be ~10−16 m3. The concentration of a single snRNP molecule or its single DSS binding site on the pre-mRNA in this volume will be ~20 pM. When the length of the pre-mRNA is n bases, there should be at least ~n non-specific binding sites for snRNPs. Single cell experimental studies suggested the timescale required by the snRNPs to non-specifically interact with the pre-mRNA is about ~0.1 s [24], [25], [26]. This value suggests an overall off-rate An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e139.jpg. There are approximately N0~108 snRNPs inside the nuclear volume [41] which means that the number of non-specific collisions that can happen between a single snRNP molecule and the growing pre-mRNA chain will be in the order of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002747.e140.jpg.

Supporting Information

Figure S1

Mouse. This supplementary figure provides further examples showing the splicing index as a function of the annotated exon number (the format is the same as the one in Figure 3A ; see Figure 3A caption for details). A. Affymetrix Transcript ID: 6747308 Gene: Lypla1, lysophospholipase 1, NM_008866 B. Affymetrix Transcript ID: 6865573 Gene: Cep120, centrosomal protein 120, NM_178686 C. Affymetrix Transcript ID: 6770693 Gene: Osbpl8, oxysterol binding protein-like 8, NM_175489 D. Affymetrix Transcript ID: 6770718 Gene: Nap1l1, nucleosome assembly protein 1-like 1 NM_015781 E. Affymetrix Transcript ID: 6839871 Gene: Hira, histone cell cycle regulation defective homolog A, NM_010435. F. Affymetrix Transcript ID: 6814200 Gene: Mus musculus mRNA for mKIAA0947 protein. ENSMUST00000043493//ENSEMBL//hypothetical protein LOC218333 isoform 1 gene: ENSMUSG00000034525 G. Affymetrix Transcript ID: 6915559 Gene: Fggy, FGGY carbohydrate kinase domain containing, NM_029347 H. Affymetrix Transcript ID: 6825511 Gene: NM_028032, Ppp2r2a, protein phosphatase 2 (formerly 2A) regulatory subunit B (PR 52) alpha isoform.

(PDF)

Figure S2

Human. This supplementary figure provides further examples showing the splicing index as a function of the annotated exon number (the format is the same as the one in Figure 3B ; see Figure 3B caption for details). A. Affymetrix Transcript ID: 2477073, NM_016441, CRIM1, cysteine rich transmembrane BMP regulator 1 (chordin-like). B. Affymetrix Transcript ID: 2481379, NM_172311, STON1-GTF2A1L, STON1-GTF2A1L read through transcript. C. Affymetrix Transcript ID: 2482505, NM_003128, SPTBN1, spectrin beta, non-erythrocytic 1. D. Affymetrix Transcript ID: 2639552, NM_003947//KALRN//kalirin, RhoGEF kinase. E. Affymetrix Transcript ID: 2639734, NM_007064//KALRN//kalirin, RhoGEF kinase. F. Affymetrix Transcript ID: 2829171, NM_003202//TCF7//transcription factor 7 (T-cell specific, HMG-box). G. Affymetrix Transcript ID: 3179975, NM_005392//PHF2//PHD finger protein 2. H. Affymetrix Transcript ID: 3183604, NM_021224//ZNF462//zinc finger protein 462.

(PDF)

Figure S3

This figure provides complementary data to Figure 4 . A–B. Standard error of the FENAS signal for mouse (A) and human (B). There is one line for each tissue but the curves overlap. C–D. Number of transcripts (count) with a given exon number for mouse (C) and human (D).

(PDF)

Figure S4

This figure provides complementary data to Figure 6 . A–B. Standard error of hm,k for mouse (A) and human (B). C–D. Number of transcripts (count) with a given number of exons in mouse (C) and human (D).

(PDF)

Text S1

List of variables defined in the text.

(PDF)

Funding Statement

This work was funded by NSF grant #0954570 and NIH grant #DP2OD006461-01 and #1R21NS070250-01A1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Levin B (2003) Genes VIII. Genes and Signals. Prentice Hall.
2. Ptashne M, Gann A (2002) Genes and Signals. New York: Cold Spring Harbor Laboratory Press.
3. Sharp P (1994) Split genes and RNA splicing. Cell 77: 805–815. [PubMed]
4. Manley JL, Tacke R (1996) SR proteins and splicing control. Genes Dev 10: 1569–1579. [PubMed]
5. Burge CB, Tuschl T., Sharp P.A. (1999) Splicing of precursors to mRNAs by the spliceosome In: Gesteland R, Cech, TR, Atkins, JF, editor. The RNA World. Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press. pp. 525–560.
6. Blencowe B (2000) Exonic splicing enhancers: mechanism of action, diversity and role in human genetic diseases. Trends Biochem Sci 25: 106–110. [PubMed]
7. Black D (2003) Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem 72: 291–336. [PubMed]
8. Black DL (2000) Protein Diversity from Alternative Splicing: A Challenge for Bioinformatics and Post-Genome Biology. Cell 103: 3. [PubMed]
9. Graveley B (2001) Alternative splicing: increasing diversity in the proteomic world. Trends Genet 17: 100–107. [PubMed]
10. Yeo G, Holste D, Kreiman G, Burge C (2004) Variation in alternative splicing across human tissues. Genome Biol 5: R74. [PMC free article] [PubMed]
11. Kabat J, Barberan-Soler S, McKenna P, Clawson H, Farrer T, Zahler AM (2006) Intronic Alternative Splicing Regulators Identified by Comparative Genomics in Nematodes. PLoS Comput Biol 2: 0734–0747. [PMC free article] [PubMed]
12. Lam B, Hertel KJ (2002) A general role for splicing enhancers in exon definition. RNA 8: 1233–1241. [PubMed]
13. Hertel K, Maniatis T (1998) The Function of Multisite Splicing Enhancers. Mol Cell 1: 449–455. [PubMed]
14. Reed R (1996) Initial splice-site recognition and pairing during pre-mRNA splicing. Curr Opin Gen Dev 6: 215–220. [PubMed]
15. Neugebauer KM (2002) On the importance of being co-transcriptional. J Cell Sci 115: 6. [PubMed]
16. Kornblihtt A (2006) Chromatin, transcript elongation and alternative splicing. Nat Struct Mol Biol 13: 5–7. [PubMed]
17. Bentley D (2002) The mRNA assembly line: transcription and processing machines in the same factory. Curr Opin Gen Dev 14: 6. [PubMed]
18. Bentley D (2005) Rules of engagement: co-transcriptional recruitment of pre-mRNA processing factors. Curr Opin Cell Biol 17: 251–256. [PubMed]
19. Du L, Warren SL (1997) A Functional Interaction between the Carboxy-Terminal Domain of RNA Polymerase II and Pre-mRNA Splicing. J Cell Biol 136: 5–18. [PMC free article] [PubMed]
20. de la Mata M, Alonso CR, Kadener S, Fededa JP, Blaustein M, et al. (2003) A Slow RNA Polymerase II Affects Alternative Splicing In Vivo. Mol Cell 12: 525–532. [PubMed]
21. Fairbrother W, Yeh RF, Sharp PA, Burge AB (2002) Predictive Identification of Exonic Splicing Enhancers in Human Genes. Science 297: 1007–1013. [PubMed]
22. Fairbrother WG, Holste D, Burge C, Sharp PA (2004) Single nucleotide polymorphism-based validation of exonic splicing enhancers. PLoS Biol 2: 1388–1392. [PMC free article] [PubMed]
23. Lim L, Burge CB (2001) A computational analysis of sequence features involved in recognition of short introns. Proc Natl Acad Sci U S A 98: 11193–11198. [PubMed]
24. Huranová M, Ivani I, Benda A, Poser I, Brody Y, et al. (2010) The differential interaction of snRNPs with pre-mRNA reveals splicing kinetics in living cells. J Cell Biol 191: 75–86. [PMC free article] [PubMed]
25. Rino J, Carvalho T, Braga J, Desterro JMP, Luhrmann R, et al. (2007) A Stochastic View of Spliceosome Assembly and Recycling in the Nucleus. PLoS Comput Biol 3: e201–222. [PMC free article] [PubMed]
26. Grunwald D, Spottke B, Buschmann V, Kubitscheck U (2006) Intranuclear Binding Kinetics and Mobility of Single Native U1 snRNP Particles in Living Cells. Mol Biol Cell 17: 5017–5027. [PMC free article] [PubMed]
27. Berg O, Winter RB, von Hippel PH (1981) Diffusion-driven mechanisms of protein translocation on nucleic acids. 1. Models and Theory 1. Biochemistry 20: 6929–6948. [PubMed]
28. Murugan R (2010) Theory of site-specific DNA-protein interactions in the presence of conformational fluctuations of DNA binding domains. Biophys J 99: 353–359. [PubMed]
29. Murugan R (2007) Generalized theory of site-specific DNA-protein interactions. Phys Rev E 76: 011901. [PubMed]
30. Lomholt M, Broek V, Kalisch S, Wuite G, Metzler R (2009) Facilitated diffusion with DNA coiling. Proc Natl Acad Sci U S A 106: 8204–8208. [PubMed]
31. Gardiner CW (2004) Handbook of Stochastic Methods. Berlin: Springer.
32. Elf J, Li GW, Xie XS (2007) Probing Transcription Factor Dynamics at the Single-Molecule Level in a Living Cell. Science 316: 1191–1194. [PMC free article] [PubMed]
33. Huang RS, Duan S, Shukla SJ, Kistner EO, Clark TA, et al. (2007) Identification of genetic variants contributing to cisplatin-induced cytotoxicity by use of a genomewide approach. Am J Hum Genet 81: 427–437. [PubMed]
34. Huang RS, Duan S, Bleibel WK, Kistner EO, Zhang W, et al. (2007) A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity. Proc Natl Acad Sci U S A 104: 9758–9763. [PubMed]
35. Polymenidou M, Lagier-Tourenne C, Hutt KR, Huelga SC, Moran J, et al. (2011) Long pre-mRNA depletion and RNA missplicing contribute to neuronal vulnerability from loss of TDP-43. Nat Neurosci 14: 459–468. [PMC free article] [PubMed]
36. Huelga SC, Vu AQ, Arnold JD, Liang TY, Liu PP, et al. (2012) Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins. Cell Rep 1: 167–178. [PMC free article] [PubMed]
37. Edgar R, Domrachev M, Lash AE (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30: 207–210. [PMC free article] [PubMed]
38. Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, et al. (2011) NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res 39: D1005–1010. [PMC free article] [PubMed]
39. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical Recipes: The art of scientific computing. Cambridge: Cambridge University Press.
40. Darzacq X, Shav-Tal Y, de Turris V, Brody Y, Shenoy SM, et al. (2007) In vivo dynamics of RNA polymerase II transcription. Nat Struct Mol Biol 14: 796–806. [PMC free article] [PubMed]
41. Varani G, Nagai K (1998) RNA recognition by RNP proteins during RNA processing. Annu Rev Biophys Biomol Struct 27: 407–445. [PubMed]

Articles from PLoS Computational Biology are provided here courtesy of Public Library of Science