PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of narLink to Publisher's site
 
Nucleic Acids Res. 2012 May; 40(10): e73.
Published online 2012 February 9. doi:  10.1093/nar/gks061
PMCID: PMC3378896

Identification of activated cryptic 5′ splice sites using structure profiles and odds measure

Abstract

The activation of cryptic 5′ splice sites (5′ SSs) is often related to human hereditary diseases. The DNA-based mutation screening strategies are commonly used to recognize the cryptic 5′ SSs, because features of the local DNA sequence can influence the choice of cryptic 5′ SSs. To improve the identification of the cryptic 5′ SSs, we developed a structure-based method, named SPO (structure profiles and odds measure), which combines two parameters, the structural feature derived from hydroxyl radical cleavage pattern and odds measure, to assess the likelihood of a cryptic 5′ SS activation in competing with its paired authentic 5′ SS. Compared to the current tools for identifying activated cryptic 5′ SSs, the SPO algorithm achieves higher prediction accuracy than the other methods, including MaxEnt, MDD, Markov model, weight matrix model, Shapiro and Senapathy matrix, Ri and ΔG. In addition, the predicted ΔSPO scores from the SPO algorithm exhibited a greater degree of correlation with the strength of cryptic 5′ SS activation than that measured from the other seven methods. In conclusion, the SPO algorithm provides an optimal identification of cryptic 5′ SSs, can be applied in designing mutagenesis experiments for various splicing events and may be helpful to investigate the relationship between structural variants and human hereditary diseases.

INTRODUCTION

Mutations at splice sites occur frequently and result in the activation of the so-called cryptic splice sites (1–3). Two typical cases in human genes, BRCA1 and BRCA2, contain several intronic genetic variants (4,5), and approximately 5% of these are associated with splice site mutation (4). These mutations have a potential effect on the activation of cryptic 5′ splice sites (5′ SSs) (4,5) that lead to cryptic splicing events. These cryptic splicing events were considered aberrant and often cause human hereditary diseases (2,6). Therefore, predicting the activation of cryptic 5′ SSs is an essential approach in investigating human hereditary diseases.

Various approaches are used in cryptic 5′ SS identification. Recently, an EST-based method named cryptic splice finder (CSF) (7) used the spliced alignment of ESTs to identify the cryptic splice site. Although the CSF program is useful for investigating splicing mutation in genetic disease, it relies considerably on the availability of sufficient EST data and accurate genomic annotations. Another approach (8,9) used information content (Ri) to detect activated cryptic 5′ SSs in human genes. Ri is the dot product of a particular sequence vector and weight matrix derived from the nucleotide frequencies at each splice site and is used to interpret mutated authentic splice sites and associated splicing regulatory sites (9). Although Ri provides useful information for analyzing the nucleotide substitutions that potentially impair splicing, the identification of activated cryptic 5′ SSs was reported to be less accurate. Sahashi et al. (10) recently used the improved Ri to estimate the splicing consequences of mutations at human 5′ SSs and discovered that Ri had low sensitivity in predicting splicing mutations. In addition to the sequence-based analyses mentioned, a thermodynamic inference scheme, based on binding free energy (ΔG) toward the stability of the RNA duplex between 5′ SS and U1 snRNA, was proposed for 5′ SS selection (11). The method considered the effects of molecular structure and revealed that the ΔG method may discriminate strong and intermediate activation of cryptic 5′SSs in competition assays. However, the identification for the intrinsic strength of cryptic 5′ SSs using ΔG is considerably inaccurate (6). Recently, Buratti et al. (12) collected 254 cryptic 5′ SSs that were activated by mutations in human disease genes and analyzed the mutation patterns and nucleotide structures in detail. They also evaluated the performance of several computational methods, including the Shapiro and Senapathy matrix (S&S) (13), the weight matrix model (WMM) (14), the first-order Markov model (MM) (15), the maximum entropy (MaxeEnt) (16) and the maximum dependence decomposition model (MDD) (17) in discriminating authentic and cryptic 5′ SSs. Buratti et al. (2007) concluded that most of the authentic 5′ SSs contained a prediction score that was statistically higher than that in the cryptic 5′ SSs. Although most methods can locate the splice sites based on searching specific sequence patterns, the discrepancies between activated and inactivated splice sites are not addressed. In other words, these methods cannot identify the activation of cryptic splice sites when the mutations do not cause a change in prediction scores.

DNA molecules form complex structures and function by interacting with proteins, nucleic acids and other small regulatory molecules. To detect such interactions, the hydroxyl radical cleavage patterns (18,19) were widely used for monitoring structural changes of DNA molecules with single residue spatial resolution. For example, the hydroxyl radical cleavage pattern was used for assessing the structure of DNA molecules and their related biological regulation (20,21), especially the interactions of DNA–protein complexes (22–24). Recently, the hydroxyl radical cleavage patterns of DNA were discovered to be associated with context-dependent mutation rates in mammals (25) and local sequence bias of human mutation (26). In addition, Parker et al. (27) used the ORChID (OH Radical Cleavage Intensity Database) (28) as genome-scale structural information to analyze the functional non-coding regions of the human genome. Their results indicated that single-nucleotide polymorphisms could induce larger structural changes in the non-coding DNA, and DNA structural changes may help to identify the phenotype-associated mutations (27). Importantly, a recent report indicated that the changes of the structure properties of the local DNA sequence can influence the choice of cryptic 5′ SSs when DNA variants occur in human disease genes (29). Therefore, it is crucial to realize the influence of single base pair substitutions in local DNA sequence context on the mRNA splicing phenotype. According to these studies, the DNA structure change may be a crucial factor for studying cryptic 5′ SS activation in human hereditary diseases; therefore, we used the hydroxyl radical cleavage pattern as the structure feature to improve the prediction for cryptic 5′ SSs in human disease genes.

The preference of DNA-based mutation screening strategies (12,30) was used to investigate cryptic 5′ SSs in genetic diseases, and the feature was applied in the prediction tool (30). In fact, some signals that may influence the choice of 5′ SSs in the local DNA sequence have been tested as a splicing feature for 5′ SS prediction (31). To our knowledge, the association of DNA structure and the choice of cryptic 5′ SSs are rarely discussed, and a structure-based method for the screening of activated cryptic 5′ SSs for human disease genes is not available. In this study, an advanced version with structure-based method, named structure profiles and odds measure (SPO) algorithm, was developed to quantitatively evaluate the activation of a cryptic 5′ SS in competing with its authentic 5′ SS. The SPO algorithm combined structural profiles with odds measure to assess the activation likelihood for a cryptic 5′ SS. The results indicates that the SPO algorithm was more efficient than the other seven approaches, including S&S (13), WMM (14), MM (15), MaxeEnt (16), MDD (17), Ri (10) and ΔG methods (32), in identifying an activated cryptic 5′ SS in competition with its paired authentic 5′ SS. In addition, the ΔSPO score from the SPO algorithm was a more effective score than the others in identifying the inherent strength of 5′ SSs in human disease genes.

MATERIALS AND METHODS

Data sets

Two sets of human mutation splicing sequence data were used for the development and evaluation of the SPO algorithm. The first data set, HMD1, was collected from published studies (6,8,12) containing 490 authentic and cryptic 5′ SS data pairs (Supplementary Table S1), which were experimentally validated. Of the 490 data pairs, 275 were inactivated pairs and 215 were activated pairs. These 490 pairs of splice site sequences were used to train the SPO algorithm in determining a scoring threshold for the successful prediction of cryptic 5′ SS activation. The second data set, HMD2, contained 52 data pairs (Supplementary Table S2) from two competition assays, competition scheme I (CS-I) and competition scheme II (CS-II), which contained 26 authentic and cryptic 5′ SS data pairs (11). The CS-I compared mutations of cryptic 5′ SSs with wild types of authentic 5′ SSs, whereas CS-II compared mutations of cryptic 5′ SSs with weakened types of authentic 5′ SSs. From CS-I and CS-II, each group of 26 cryptic 5′ SSs was subdivided into 6 strong, 13 intermediate and 7 weak cryptic 5′ SSs according to their splicing strength. The HDM2 sequences were solely used to correlate the scoring method with the actual activation strength for cryptic 5′ SS independent from those 490 paired splicing sequences from HDM1. In total, 189 249 5′ SSs (10) from the entire human genome were extracted as source data for the SPO algorithm.

SPO algorithm

For the likelihood of activating a cryptic 5′ SS, the SPO algorithm was developed based on the combination of structural profiles with odds measure. The structural profiles consider the local DNA structural change between the before and after mutation that occurs in a 5′ SS and the odds measure computes the actual relative probability for a splicing event to occur. Figure 1 shows the SPO algorithm. The details of defining and combining these two numerals (‘SP’ for structural profiles and ‘O’ for odds) into the proposed ‘SPO’ algorithm are as follows:

  • (1)
    First, a 5′ SS pattern was defined as {X1, X2,  , Xm}, where Xm represents the m-th nucleotide and consists of nucleotide bases {A, G, C, T}. X1, X2 and X3 obtain from exonic region, and X6, X7, X8 and X9 obtain from intronic region. X4 and X5 are the center consensus of a 5′ SS. Following the convention for the splice site coordinate, the center consensus X4 and X5 assume the position of GT. Second, the hydroxyl radical cleavage pattern from ORChID (28) was used as DNA structural profiles and provided high-resolution quantitative information of the local shape of DNA molecules. Before mutation occurrence, the DNA structural profile for a 5′ SS pattern was defined as (Y1b, Y2b,  , Ymb), where Ymb represents the structural profile of the m-th nucleotide. After mutation occurrence, the DNA structural profile for a 5′ SS pattern was defined as (Y1a, Y2a,  , Yma), where Yma represents the structural profile of the m-th nucleotide. The DNA structural change for a 5′ SS between the before and after mutation occurrence was defined as S(Yma, Ymb) and was computed using Euclidean distance. In detail, S(Yma, Ymb) was given by:
    equation image
    (1)
  • The Sp value and Sq value were defined as the structural change for a cryptic 5′ SS and an authentic 5′ SS individually. Finally, the Sc value was defined as (Sp + Sq + 1) and used to assess the activation likelihood for a cryptic 5′ SS. Here, to avoid Sc = 0 causing a non-meaning Or value (see the next paragraph), 1 as a constant was used to keep Sc = 1 when Sp + Sq = 0.
  • (2)
    The improved odds measure was used to identify activated cryptic 5′ SSs. All known 189 249 5′ SSs (10) in human genome were extracted as source data N. The odds (Os) were computed for each of these 4679 non-redundant sequences from the 189 249 5′ SSs. The Os was defined as a square root of (M/N)/(1  M/N), where M is the number of occurrences of a particular splicing sequence in the source data N. If a splicing sequence did not appear in the source data N, the Os were defined as a square root of (0.25/N)/(1  0.25/N) to avoid the infinity caused by odds ratio calculations. Note 0.25 as a parameter quoted from Sahashi's study (10). To increase the computation speed for Os, all 5′ SS sequences in the source data N were permutated for each splicing sequences. This was followed by pre-computing and indexing of all Os in the database to efficiently retrieve Os for any given splicing sequence. After mutation occurrence, an improved odds ratio (Or) was defined as the Os value of a cryptic 5′ SS divided by the Os value of its paired authentic 5′ SS. Finally, the Or value was used to assess the activation likelihood for a cryptic 5′ SS.
  • (3)
    The SPO value was defined as the Sc value multiplied by the Or value. Finally, the SPO value was used as ΔSPO score for identifying activated cryptic 5′ SSs.

Figure 1.
Flow chart of SPO algorithm.

Performance analysis

The performance of the proposed SPO algorithm in the identification of activated cryptic 5′ SSs was evaluated with the other seven reported approaches, that is, S&S (13), WMM (14), MM (15), MaxeEnt (16), MDD (17), Ri (10) and ΔG (32). Comparative evaluation was conducted by using a 5-fold cross-validation of 490 paired splicing sequences that were included in the HMD1 data set. First, all 490 pairs of splicing sequences were divided equally into five partitions. Each partition was a testing set, and the remaining four partitions were used for training. In total, five testing sets were used, and each training set was four times the size of its corresponding testing set. The indices that were used to evaluate the performance included the following: sensitivity, specificity, accuracy, precision and F-measure, which may be defined as TP/(TP + FN), TN/(FP + TN), (TP + TN)/(TP + FN + TN + FP), TP/(TP + FP) and 2 × (sensitivity × specificity)/(sensitivity + specificity), respectively. The TP, TN, FP and FN represented the count of true positive, true negative, false positive and false negative cases, respectively. The receiver operating characteristic (ROC) curves from the sensitivity and 1 − specificity of the eight methods were constructed based on varying delta scores for determining the activation of a cryptic 5′ SS. The area under the ROC curve (AUC) was used as a measurement for their performance. In addition to these methods, Pearson's coefficient was also used to evaluate the correlation between the predicted scores and the activation strength of cryptic 5′ SSs from the HMD2 data set.

Determining ΔSPO score threshold for an activated cryptic 5′ SS

A 5-fold cross-validation of 490 paired sequences from the HMD1 data set was conducted. This 5-fold cross-validation was also used to determine the ΔSPO threshold in the SPO algorithm. For each of the five sets of training sequences, the ΔSPO threshold that yielded the optimal F-measure on the corresponding testing sequences was chosen. The value that corresponded to the highest occurrence of these five thresholds (to five decimal points) was designated as T for the ΔSPO threshold in the SPO algorithm. Based on this, a cryptic 5′ SS competing with its authentic 5′ SS was considered activated if its ΔSPO score was greater than T, and the amount of ΔSPO score elevated from T was used to rank the probability for such activation. If no single highest occurrence appeared from any of these five thresholds, the 5-fold cross-validation was reiterated until such a threshold was obtained.

RESULTS AND DISCUSSION

Identification of activated cryptic 5′ SS by scoring methods

An HMD1 data set that contained 490 pairs of human authentic and cryptic 5′ splice sequences was used for evaluating the performance of the proposed SPO algorithm (Supplementary Table S3). A threshold of T = 1.2214, previously obtained from analyzing the HMD1 data set with 5-fold cross-validation, was used to determine whether a splice site was activated. The detailed sensitivity, precision, specificity, false positive rate, accuracy and F-measure in different ΔSPO score thresholds were shown in Figure 2. Moreover, the other seven reported approaches, including S&S (13), WMM (14), MM (15), MaxeEnt (16), MDD (17), Ri (10) and ΔG (32), were used for comparison. Note that these seven approaches can evaluate the likelihood of a 5′ SS based on searching specific sequence patterns, but they do not consider the comparative competition between a cryptic 5′ SS and its paired authentic 5′ SS. Therefore, to assess the likelihood of a cryptic 5′ SS activation in competing with its paired authentic 5′ SS, these seven approaches were modified by using the following scheme. After mutation occurrence, ‘ΔRi’ was defined as the Ri value of a cryptic 5′ SS subtracted by the Ri value of its paired authentic 5′ SS. The other methods were modified by using the same procedure, except ΔG method. Subject to the definition of ΔG, the delta of ΔG was defined and represented by the symbol ‘ΔΔG’. ΔΔG was the ΔG value of the authentic 5′ SS subtracted by the ΔG value of the cryptic 5′ SS. All seven deltas were derived from the same 490 paired splicing sequences. Finally, −0.009, −0.09, −0.27, 0.9362, 0.9836, −0.5408 and −0.0001 were obtained as the ΔSPO threshold for ΔMaxEnt, ΔMDD, ΔMM, ΔWMM, ΔS&S, ΔRi and ΔΔG, respectively.

Figure 2.
Sensitivity, specificity, precision, false positive rate, accuracy and F-measure vary with ΔSPO score. (A) Sensitivity and precision vary with ΔSPO score; (B) specificity and false positive rate vary with ΔSPO score; (C) accuracy ...

Table 1 summarizes the performance of these eight scoring methods. According to the results from the 5-fold cross-validation, the SPO algorithm outperformed the others for accurately identifying activated cryptic 5′ SSs competing with paired authentic 5′ SSs in all six categories. Note a different modified strategy (taking the ratio defined as cryptic 5′ SS score divided by authentic 5′ SS score) for the seven scoring methods was also tested, the result remained consistent (Supplementary Table S4). The quantitative comparison between the scoring methods also showed that the SPO algorithm had the best prediction performance (Figure 3). In addition, the proposed SPO algorithm predicted 166/202 = 82.2% point mutation cases, 8/9 = 88.9% deletion cases, 2/3 = 66.7% insertion cases and 1/1 = 100% duplication cases when these mutations occurred. In the comparison with the other seven reported approaches (Table 2), the SPO algorithm yielded the highest accuracy for the identification of activated cryptic 5′ SSs in various mutant categories, especially in point mutation cases.

Figure 3.
Comparison of predictive accuracy of the scoring methods for identifying activated cryptic 5′ SSs. (A) Sensitivity versus 1 − specificity for the scoring methods; (B) false positive rate versus false negative rate for the scoring methods. ...
Table 1.
Performance of scoring methods in identifying activated cryptic 5′ SSs based on 490 paired splicing sequences included in the HMD1 data set
Table 2.
Accuracy of scoring methods in different mutant categories

Identification of cryptic 5′ SS of different strengths

To verify that the proposed SPO algorithm can identify cryptic 5′ SSs of various activation strengths, an HMD2 data set containing 52 data pairs from two competition assays (11) was used, including 12 strong, 26 intermediate and 14 weak 5′ SSs, according to various activation levels (11). Based on the comparison for the performance of the other seven methods (Table 3), the SPO algorithm consistently achieved a high accuracy in all of the three groups and yielded the highest accuracy when the three groups of data were pair wisely combined as used in Roca’ study (11).

Table 3.
Accuracy of scoring methods in competition assays based on 52 (12 strong, 26 intermediate and 14 weak) paired splicing sequences in the HMD2 data set

A Pearson's coefficient (r value) was computed between these two variables by using the HMD2 test data (consisting of two competition assays CS-I and CS-II, each of which included 26 authentic and cryptic 5′ SS data pairs) to correlate the strength of cryptic 5′ SS activation with the predicted ΔSPO scores. Table 4 summarizes the resulting r values for ΔSPO, ΔMaxEnt, ΔMDD, ΔMM, ΔWMM, ΔS&S, ΔRi and ΔΔG scores, in which the SPO algorithm displayed a greater degree of correlation than the others. In particular, the SPO algorithm appeared to perform efficiently for both CS-I and CS-II assays; however, all the other seven methods demonstrated relatively inferior performance for CS-I assay than for CS-II assay. It is known that wild types of authentic 5′ SSs were used in CS-I assay, but weakened types of authentic 5′ SSs were used in the CS-II assay (11). In in vitro experiments, the average activation of cryptic 5′ SSs was considerably stronger (P = 6.13E−07) in the CS-II assay than in the CS-I assay. Therefore, activation of cryptic 5′ SSs in the CS-II assay is easier than in the CS-I assay. In summary, the SPO algorithm was able to correctly predict the activation of a cryptic 5′ SS as well as to infer the activation level by evaluating the increase of ΔSPO score from its threshold. With this feature, it is reasonable to verify the cryptic 5′ SS activation by ranking the ΔSPO scores, when a number of splicing pairs were available for consideration. In other words, SPO algorithm can be used to predict novel cryptic 5′ SSs, especially when sequencing data (like RNA-seq data) is not available.

Table 4.
Pearson's correlation coefficients of the competition assays of 5′ SSs and their scores in the HMD2 data set

DNA structural profiles as an impact factor in cryptic 5′ SS

To analyze whether DNA structural profiles extracting from the hydroxyl radical cleavage pattern can improve the identification of activated cryptic 5′ SSs, the HMD1 data set and HMD2 data set were used to estimate the effect of structural profiles. First, without the inference from structural profiles, the identification for activated cryptic 5′ SSs from HMD1 data set decreased by 7.9% in sensitivity, 4.4% in specificity, 5.9% in accuracy, 6.2% in F-measure, 6.2% in precision and 5.7% in AUC (corresponding to the result in Table 1). Second, without using structural profiles, the SPO algorithm obtained a lower degree (82%) of correlation between the strength of cryptic 5′ SS activation and ΔSPO score (corresponding to the result in Table 4), and its accuracy decreased to 0.865 for the analysis of the 52 data pairs from HMD2 data set. Interestingly, the DNA structural profiles can also improve the 2, 2, 2, 4, 5, 6 and 1% degrees of correlation between the strength of cryptic 5′ SS activation and score from MaxeEnt (16), MDD (17), MM (15), S&S (13), WMM (14), Ri (10) and ΔG (32), respectively, for the analysis of the HMD2 data set. The improvement for the seven methods was based on the use of the Sc value as a weight factor to multiply the original scores from these compared approaches. For example, an improved ΔMaxEnt score was defined as the ΔMaxEnt score multiplied by the Sc value. The scores for the other methods were improved by using the same strategy. These results indicate that DNA structural profiles derived from the hydroxyl radical cleavage pattern can improve the identification of activated cryptic 5′ SSs in human mutation cases.

Although the effect of DNA structural profiles was useful for identifying activated cryptic 5′ SSs, the detailed relationship between the DNA structural profiles and the cryptic 5′ SSs is unclear. One possible explanation could be that the changes of the DNA structural profiles at either the cryptic 5′ SS or the corresponding authentic 5′ SS may respond to the strength of cryptic 5′ SS activation. On the other hand, the changes of DNA structural profiles may be involved in non-intronic splicing mechanism when mutation occurs on the DNA level. Some non-intronic splicing information was assumed to play a vital role in shaping the split structure of eukaryote genes (7). Consequently, the DNA structural profiles may improve the identification of cryptic 5′ SSs in eukaryote genes.

CONCLUSION

This study proposes the SPO algorithm that combined structural profiles with odds measure to obtain the ΔSPO score for identifying the activated cryptic 5′ SSs. Based on the results, the SPO algorithm yields a superior identification of cryptic 5′ SSs than that by the other seven methods, and its ΔSPO score also provides information to estimate the inherent strength of 5′ SSs in human mutation data. In practical application, the SPO algorithm can be used as a powerful tool for designing mutagenesis experiments of various splicing events and can be used to study the influences of activated cryptic 5′ SSs in the field of amino acid changes in human hereditary diseases.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Tables 1–4.

FUNDING

National Science Council of Taiwan (Grant No: NSC99-2627-M-001-005-MY3; 99-2621-B-001-005-MY2). Funding for open access charge: Biodiversity Research Center, Academia Sinica, Taiwan.

Conflict of interest statement. None declared.

Supplementary Material

Supplementary Data:

ACKNOWLEDGEMENTS

We thank for the comments from the anonymous reviewers. The experimental data provided by Roca, Rogan and Buratti’ studies are also appreciated.

REFERENCES

1. Baralle D, Baralle M. Splicing in action: assessing disease causing sequence changes. J. Med. Genet. 2005;42:737–748. [PMC free article] [PubMed]
2. Krawczak M, Reiss J, Cooper DN. The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: causes and consequences. Hum. Genet. 1992;90:41–54. [PubMed]
3. Nakai K, Sakamoto H. Construction of a novel database containing aberrant splicing mutations of mammalian genes. Gene. 1994;141:171–177. [PubMed]
4. Chen X, Truong TT, Weaver J, Bove BA, Cattie K, Armstrong BA, Daly MB, Godwin AK. Intronic alterations in BRCA1 and BRCA2: effect on mRNA splicing fidelity and expression. Hum. Mutat. 2006;27:427–435. [PubMed]
5. Guimarães CP, Lemos M, Menezes I, Coelho T, Sá-Miranda C, Azevedo JE. Characterisation of two mutations in the ABCD1 gene leading to low levels of normal ALDP. Hum. Genet. 2001;109:616–622. [PubMed]
6. Roca X, Sachidanandam R, Krainer AR. Intrinsic differences between authentic and cryptic 5′ splice sites. Nucleic Acids Res. 2003;31:6321–6333. [PMC free article] [PubMed]
7. Kapustin Y, Chan E, Sarkar R, Wong F, Vorechovsky I, Winston RM, Tatusova T, Dibb NJ. Cryptic splice sites and split genes. Nucleic Acids Res. 2011;39:5837–5844. [PMC free article] [PubMed]
8. Rogan PK, Faux BM, Schneider TD. Information analysis of human splice site mutations. Hum. Mutat. 1998;12:153–171. [PubMed]
9. Nalla VK, Rogan PK. Automated splicing mutation analysis by information theory. Hum. Mutat. 2005;25:334–342. [PubMed]
10. Sahashi K, Masuda A, Matsuura T, Shinmi J, Zhang Z, Takeshima Y, Matsuo M, Sobue G, Ohno K. In vitro and in silico analysis reveals an efficient algorithm to predict the splicing consequences of mutations at the 5′ splice sites. Nucleic Acids Res. 2007;35:5995–6003. [PMC free article] [PubMed]
11. Roca X, Sachidanandam R, Krainer AR. Determinants of the inherent strength of human 5′ splice sites. RNA. 2005;11:683–698. [PubMed]
12. Buratti E, Chivers M, Královicová J, Romano M, Baralle M, Krainer AR, Vorechovsky I. Aberrant 5′ splice sites in human disease genes: mutation pattern, nucleotide structure and comparison of computational tools that predict their utilization. Nucleic Acids Res. 2007;35:4250–4263. [PMC free article] [PubMed]
13. Shapiro MB, Senapathy P. RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. Nucleic Acids Res. 1987;15:7155–7174. [PMC free article] [PubMed]
14. Staden R. Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 1984;12:505–519. [PMC free article] [PubMed]
15. Salzberg SL. A method for identifying splice sites and translational start sites in eukaryotic mRNA. Comput. Appl. Biosci. 1997;13:365–376. [PubMed]
16. Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J. Comput. Biol. 2004;11:377–394. [PubMed]
17. Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 1997;268:78–94. [PubMed]
18. Tullius TD, Dombroski BA. Iron(II) EDTA used to measure the helical twist along any DNA molecule. Science. 1985;230:679–681. [PubMed]
19. Shcherbakova I, Brenowitz M. Monitoring structural changes in nucleic acids with single residue spatial and millisecond time resolution by quantitative hydroxyl radical footprinting. Nat. Protoc. 2008;3:288–302. [PubMed]
20. Shafer GE, Price MA, Tullius TD. Use of the hydroxyl radical and gel electrophoresis to study DNA structure. Electrophoresis. 1989;10:397–404. [PubMed]
21. Price MA, Tullius TD. Using hydroxyl radical to probe DNA structure. Methods Enzymol. 1992;212:194–219. [PubMed]
22. Jain SS, Tullius TD. Footprinting protein-DNA complexes using the hydroxyl radical. Nat. Protoc. 2008;3:1092–1100. [PubMed]
23. Tullius TD, Dombroski BA. Hydroxyl radical “footprinting”: high-resolution information about DNA-protein contacts and application to lambda repressor and Cro protein. Proc. Natl Acad. Sci. USA. 1986;83:5469–5473. [PubMed]
24. Viola IL, Gonzalez DH. Footprinting and missing nucleoside analysis of transcription factor-DNA complexes. Methods Mol. Biol. 2011;754:259–275. [PubMed]
25. Stoltzfus A. Evidence for a predominant role of oxidative damage in germline mutation in mammals. Mutat. Res. 2008;644:71–73. [PubMed]
26. Nakken S, Rødland EA, Hovig E. Impact of DNA physical properties on local sequence bias of human mutation. Hum. Mutat. 2010;31:1316–1325. [PubMed]
27. Parker SC, Hansen L, Abaan HO, Tullius TD, Margulies EH. Local DNA topography correlates with functional noncoding regions of the human genome. Science. 2009;324:389–392. [PMC free article] [PubMed]
28. Greenbaum JA, Pang B, Tullius TD. Construction of a genome-scale structural map at single-nucleotide resolution. Genome Res. 2007;17:947–953. [PubMed]
29. Krawczak M, Thomas NS, Hundrieser B, Mort M, Wittig M, Hampe J, Cooper DN. Single base-pair substitutions in exon-intron junctions of human genes: nature, distribution, and consequences for mRNA splicing. Hum. Mutat. 2007;28:150–158. [PubMed]
30. Divina P, Kvitkovicova A, Buratti E, Vorechovsky I. Ab initio prediction of mutation-induced cryptic splice-site activation and exon skipping. Eur. J. Hum. Genet. 2009;17:759–765. [PMC free article] [PubMed]
31. Dogan RI, Getoor L, Wilbur WJ, Mount SM. SplicePort–an interactive splice-site analysis tool. Nucleic Acids Res. 2007;35:W285–W291. [PMC free article] [PubMed]
32. Markham NR, Zuker M. DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res. 2005;33:W577–W581. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press