PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-9 (9)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
author:("Sun, studying")
1.  SpliceTrap: a method to quantify alternative splicing under single cellular conditions 
Bioinformatics  2011;27(21):3010-3016.
Motivation: Alternative splicing (AS) is a pre-mRNA maturation process leading to the expression of multiple mRNA variants from the same primary transcript. More than 90% of human genes are expressed via AS. Therefore, quantifying the inclusion level of every exon is crucial for generating accurate transcriptomic maps and studying the regulation of AS.
Results: Here we introduce SpliceTrap, a method to quantify exon inclusion levels using paired-end RNA-seq data. Unlike other tools, which focus on full-length transcript isoforms, SpliceTrap approaches the expression-level estimation of each exon as an independent Bayesian inference problem. In addition, SpliceTrap can identify major classes of alternative splicing events under a single cellular condition, without requiring a background set of reads to estimate relative splicing changes. We tested SpliceTrap both by simulation and real data analysis, and compared it to state-of-the-art tools for transcript quantification. SpliceTrap demonstrated improved accuracy, robustness and reliability in quantifying exon-inclusion ratios.
Conclusions: SpliceTrap is a useful tool to study alternative splicing regulation, especially for accurate quantification of local exon-inclusion ratios from RNA-seq data.
Availability and Implementation: SpliceTrap can be implemented online through the CSH Galaxy server http://cancan.cshl.edu/splicetrap and is also available for download and installation at http://rulai.cshl.edu/splicetrap/.
Contact: michael.zhang@utdallas.edu
Supplementary Information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr508
PMCID: PMC3198574  PMID: 21896509
2.  A Splicing-Independent Function of SF2/ASF in MicroRNA Processing 
Molecular Cell  2010;38(1):67-77.
SUMMARY
Both splicing factors and microRNAs are important regulatory molecules that play key roles in post-transcriptional gene regulation. By miRNA deep sequencing, we identified 40 miRNAs that are differentially expressed upon ectopic overexpression of the splicing factor SF2/ASF. Here we show that SF2/ASF and one of its upregulated microRNAs (miR-7) can form a negative feedback loop: SF2/ASF promotes miR-7 maturation, and mature miR-7 in turn targets the 3′UTR of SF2/ASF to repress its translation. Enhanced microRNA expression is mediated by direct interaction between SF2/ASF and the primary miR-7 transcript to facilitate Drosha cleavage and is independent of SF2/ASF’s function in splicing. Other miRNAs, including miR-221 and miR-222, may also be regulated by SF2/ASF through a similar mechanism. These results underscore a function of SF2/ASF in pri-miRNA processing and highlight the potential coordination between splicing control and miRNA-mediated gene repression in gene regulatory networks.
doi:10.1016/j.molcel.2010.02.021
PMCID: PMC3395997  PMID: 20385090
3.  How do alignment programs perform on sequencing data with varying qualities and from repetitive regions? 
BioData Mining  2012;5:6.
Background
Next-generation sequencing technologies generate a significant number of short reads that are utilized to address a variety of biological questions. However, quite often, sequencing reads tend to have low quality at the 3’ end and are generated from the repetitive regions of a genome. It is unclear how different alignment programs perform under these different cases. In order to investigate this question, we use both real data and simulated data with the above issues to evaluate the performance of four commonly used algorithms: SOAP2, Bowtie, BWA, and Novoalign.
Methods
The performance of different alignment algorithms are measured in terms of concordance between any pair of aligners (for real sequencing data without known truth) and the accuracy of simulated read alignment.
Results
Our results show that, for sequencing data with reads that have relatively good quality or that have had low quality bases trimmed off, all four alignment programs perform similarly. We have also demonstrated that trimming off low quality ends markedly increases the number of aligned reads and improves the consistency among different aligners as well, especially for low quality data. However, Novoalign is more sensitive to the improvement of data quality. Trimming off low quality ends significantly increases the concordance between Novoalign and other aligners. As for aligning reads from repetitive regions, our simulation data show that reads from repetitive regions tend to be aligned incorrectly, and suppressing reads with multiple hits can improve alignment accuracy.
Conclusions
This study provides a systematic comparison of commonly used alignment algorithms in the context of sequencing data with varying qualities and from repetitive regions. Our approach can be applied to different sequencing data sets generated from different platforms. It can also be utilized to study the performance of other alignment programs.
doi:10.1186/1756-0381-5-6
PMCID: PMC3414812  PMID: 22709551
Next generation sequencing; Alignment; Sequencing quality; SOAP2; Bowtie; BWA; Novoalign
4.  Preprocessing differential methylation hybridization microarray data 
BioData Mining  2011;4:13.
Background
DNA methylation plays a very important role in the silencing of tumor suppressor genes in various tumor types. In order to gain a genome-wide understanding of how changes in methylation affect tumor growth, the differential methylation hybridization (DMH) protocol has been developed and large amounts of DMH microarray data have been generated. However, it is still unclear how to preprocess this type of microarray data and how different background correction and normalization methods used for two-color gene expression arrays perform for the methylation microarray data. In this paper, we demonstrate our discovery of a set of internal control probes that have log ratios (M) theoretically equal to zero according to this DMH protocol. With the aid of this set of control probes, we propose two LOESS (or LOWESS, locally weighted scatter-plot smoothing) normalization methods that are novel and unique for DMH microarray data. Combining with other normalization methods (global LOESS and no normalization), we compare four normalization methods. In addition, we compare five different background correction methods.
Results
We study 20 different preprocessing methods, which are the combination of five background correction methods and four normalization methods. In order to compare these 20 methods, we evaluate their performance of identifying known methylated and un-methylated housekeeping genes based on two statistics. Comparison details are illustrated using breast cancer cell line and ovarian cancer patient methylation microarray data. Our comparison results show that different background correction methods perform similarly; however, four normalization methods perform very differently. In particular, all three different LOESS normalization methods perform better than the one without any normalization.
Conclusions
It is necessary to do within-array normalization, and the two LOESS normalization methods based on specific DMH internal control probes produce more stable and relatively better results than the global LOESS normalization method.
doi:10.1186/1756-0381-4-13
PMCID: PMC3118966  PMID: 21575229
5.  Identifying hypermethylated CpG islands using a quantile regression model 
BMC Bioinformatics  2011;12:54.
Background
DNA methylation has been shown to play an important role in the silencing of tumor suppressor genes in various tumor types. In order to have a system-wide understanding of the methylation changes that occur in tumors, we have developed a differential methylation hybridization (DMH) protocol that can simultaneously assay the methylation status of all known CpG islands (CGIs) using microarray technologies. A large percentage of signals obtained from microarrays can be attributed to various measurable and unmeasurable confounding factors unrelated to the biological question at hand. In order to correct the bias due to noise, we first implemented a quantile regression model, with a quantile level equal to 75%, to identify hypermethylated CGIs in an earlier work. As a proof of concept, we applied this model to methylation microarray data generated from breast cancer cell lines. However, we were unsure whether 75% was the best quantile level for identifying hypermethylated CGIs. In this paper, we attempt to determine which quantile level should be used to identify hypermethylated CGIs and their associated genes.
Results
We introduce three statistical measurements to compare the performance of the proposed quantile regression model at different quantile levels (95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%), using known methylated genes and unmethylated housekeeping genes reported in breast cancer cell lines and ovarian cancer patients. Our results show that the quantile levels ranging from 80% to 90% are better at identifying known methylated and unmethylated genes.
Conclusions
In this paper, we propose to use a quantile regression model to identify hypermethylated CGIs by incorporating probe effects to account for noise due to unmeasurable factors. Our model can efficiently identify hypermethylated CGIs in both breast and ovarian cancer data.
doi:10.1186/1471-2105-12-54
PMCID: PMC3051900  PMID: 21324121
6.  SF2/ASF Autoregulation Involves Multiple Layers of Post-transcriptional and Translational Control 
SF2/ASF is a prototypical SR protein, with important roles in splicing and other aspects of mRNA metabolism. SFRS1 (SF2/ASF) is a potent proto-oncogene with abnormal expression in many tumors. We found that SF2/ASF negatively autoregulates its expression to maintain homeostatic levels. We characterized six SF2/ASF alternatively spliced mRNA isoforms: the major isoform encodes full-length protein, whereas the others are either retained in the nucleus or degraded by NMD. Unproductive splicing accounts for only part of the autoregulation, which occurs primarily at the translational level. The effect is specific to SF2/ASF and requires RRM2. The ultraconserved 3′UTR is necessary and sufficient for downregulation. SF2/ASF overexpression shifts the distribution of target mRNA towards mono-ribosomes, and translational repression is partly independent of Dicer and a 5′ cap. Thus, multiple post-transcriptional and translational mechanisms are involved in fine-tuning the expression of SF2/ASF.
doi:10.1038/nsmb.1750
PMCID: PMC2921916  PMID: 20139984
7.  How old is this mutation? - a study of three Ashkenazi Jewish founder mutations 
BMC Genetics  2010;11:39.
Background
Several founder mutations leading to increased risk of cancer among Ashkenazi Jewish individuals have been identified, and some estimates of the age of the mutations have been published. A variety of different methods have been used previously to estimate the age of the mutations. Here three datasets containing genotype information near known founder mutations are reanalyzed in order to compare three approaches for estimating the age of a mutation. The methods are: (a) the single marker method used by Risch et al., (1995); (b) the intra-allelic coalescent model known as DMLE, and (c) the Goldgar method proposed in Neuhausen et al. (1996), and modified slightly by our group. The three mutations analyzed were MSH2*1906 G->C, APC*I1307K, and BRCA2*6174delT.
Results
All methods depend on accurate estimates of inter-marker recombination rates. The modified Goldgar method allows for marker mutation as well as recombination, but requires prior estimates of the possible haplotypes carrying the mutation for each individual. It does not incorporate population growth rates. The DMLE method simultaneously estimates the haplotypes with the mutation age, and builds in the population growth rate. The single marker estimates, however, are more sensitive to the recombination rates and are unstable. Mutation age estimates based on DMLE are 16.8 generations for MSH2 (95% credible interval (13, 23)), 106 generations for I1037K (86-129), and 90 generations for 6174delT (71-114).
Conclusions
For recent founder mutations where marker mutations are unlikely to have occurred, both DMLE and the Goldgar method can give good results. Caution is necessary for older mutations, especially if the effective population size may have remained small for a long period of time.
doi:10.1186/1471-2156-11-39
PMCID: PMC2889843  PMID: 20470408
8.  Breast Cancer-associated Fibroblasts Confer AKT1-mediated Epigenetic Silencing of Cystatin M in Epithelial Cells 
Cancer research  2008;68(24):10257.
The interplay between histone modifications and promoter hypermethylation provides a causative explanation for epigenetic gene silencing in cancer. Less is known about the upstream initiators that direct this process. Here, we report that the Cystatin M (CST6) tumor suppressor gene is concurrently down-regulated with other loci in breast epithelial cells co-cultured with cancer-associated fibroblasts (CAFs). Promoter hypermethylation of CST6 is associated with aberrant AKT1 activation in epithelial cells, as well as the disabled INNP4B regulator resulted from the suppression by CAFs. Repressive chromatin, marked by trimethyl-H3K27 and dimethyl-H3K9, and de novo DNA methylation is established at the promoter. The findings suggest that microenvironmental stimuli are triggers in this epigenetic cascade, leading to the long-term silencing of CST6 in breast tumors. Our present findings implicate a causal mechanism defining how tumor stromal fibroblasts support neoplastic progression by manipulating the epigenome of mammary epithelial cells. The result also highlights the importance of direct cell-cell contract between epithelial cells and the surrounding fibroblasts that confer this epigenetic perturbation. Since this two-way interaction is anticipated, the described co-culture system can be used to determine the effect of epithelial factors on fibroblasts in future studies.
doi:10.1158/0008-5472.CAN-08-0288
PMCID: PMC2821873  PMID: 19074894
9.  Identifying differentially methylated genes using mixed effect and generalized least square models 
BMC Bioinformatics  2009;10:404.
Background
DNA methylation plays an important role in the process of tumorigenesis. Identifying differentially methylated genes or CpG islands (CGIs) associated with genes between two tumor subtypes is thus an important biological question. The methylation status of all CGIs in the whole genome can be assayed with differential methylation hybridization (DMH) microarrays. However, patient samples or cell lines are heterogeneous, so their methylation pattern may be very different. In addition, neighboring probes at each CGI are correlated. How these factors affect the analysis of DMH data is unknown.
Results
We propose a new method for identifying differentially methylated (DM) genes by identifying the associated DM CGI(s). At each CGI, we implement four different mixed effect and generalized least square models to identify DM genes between two groups. We compare four models with a simple least square regression model to study the impact of incorporating random effects and correlations.
Conclusions
We demonstrate that the inclusion (or exclusion) of random effects and the choice of correlation structures can significantly affect the results of the data analysis. We also assess the false discovery rate of different models using CGIs associated with housekeeping genes.
doi:10.1186/1471-2105-10-404
PMCID: PMC2800121  PMID: 20003206

Results 1-9 (9)