Alternative splicing (AS), which invalidates the old theory of “one gene one protein”, enables higher eukaryote to produce large number of transcripts with limited number of genes, and has been proposed as a primary driver of the evolution of phenotypic complexity in mammals 
. In human, ~95% of multi-exon genes undergo alternative splicing, which explains the numerical disparity between the low number of human protein-coding genes (~26,000) and the high number of human proteins (more than 90,000) 
. Alternative pre-messenger RNA splicing also influences development, physiology, and disease; many studies have reported the existence of cancer-specific alternative splicing in the absence of genomic mutations (for a review see 
Several methods have been applied to detect AS events. Expression Sequence Tag (EST) was the first widely used technology and played a leading role in detecting AS events. However, except for the relatively high cost, EST technology has many other limitations including genomic contamination, cloning bias, paralog confusing, 3′ gene bias and low sensitivity in detecting low abundance transcripts. Besides, it also requires great efforts for data interpretation 
. Microarray technologies have also played a prominent role in shaping our understanding of the complexity of transcriptome 
. Recently, whole-transcript microarrays were used to monitor 24,426 alternative splicing events in 48 human tissues and cell lines 
. Although this technology has been used extensively, limitations still persist; including limited probe coverage, cross-hybridization artifacts, requirement of previously known gene structures and difficulties in data analysis, etc.
More recently, rapid progress in the development of massively parallel sequencing such as Illumina/Solexa or Applied Biosystems/SOLiD, has provided people unprecedented opportunities to interrogate plausible alternative RNA splicing. Using these technologies, tens of millions of short tags (25–75 bases) can now be simultaneously sequenced at less than 1% the cost of traditional Sanger methods. Deep sequencing of transcriptome (RNA-seq) quickly becomes the most powerful technique to interrogate the whole transcriptional landscape 
, including both known transcript quantification and novel transcript discovery. Theoretically, all splicing events as well as chimeric transcripts can be directly detected 
. However, the RNA-seq downstream data analysis still remains a big challenge.
Several major alternative splicing forms, such as exon skipping, mutually exclusive exon, alternative first/last exon and intron retention, can be detected by simply mapping RNA-seq reads to hypothetical splicing junctions. The reliability of a splicing junction is determined by: 1) number of reads mapping to the junction (junction reads); 2) number of mismatches on each mapped read; 3) read mapping position on the junction, i.e. how close is the center of the read to the junction itself. The shorter the distance is, the less likely that this mapping is simply by chance; 4) Mismatch position on junction read, e.g. mismatches occurring at both ends of reads are more likely due to the sequencing error, while those occurring in the middle of read are more likely to be polymorphisms 
However, most previous studies only considered the first quantitative information of junction reads, i.e. an exon junction is considered to be real if it has more than R
junction reads (R
1 or 2) 
. This read-counting method, as demonstrated in the results, has both high false positive and false negative rates. On the other hand, in one of the two earliest pioneering human transcriptome studies, Pan et al 
used features similar to those described above to train both linear and nonlinear classifiers for true splicing junction detection, and achieved superior results.
In this paper, we introduced a new statistical metric, namely Minimal Match on Either Side of Exon junction (MMES), as a means to measure the “quality” of junction reads by integrating all the features listed above. Then, we presented a simple yet effective empirical statistical model using this metric to detect splicing junctions with real RNA-seq data. When validated by two highly reliable mouse transcript databases, this MMES based empirical method is shown to be remarkably more accurate than read-counting method, and also better than the logistic regression method used in Pan et al