1.  Site identification in high-throughput RNA–protein interaction data 
Bioinformatics  2012;28(23):3013-3020.
Motivation: Post-transcriptional and co-transcriptional regulation is a crucial link between genotype and phenotype. The central players are the RNA-binding proteins, and experimental technologies [such as cross-linking with immunoprecipitation- (CLIP-) and RIP-seq] for probing their activities have advanced rapidly over the course of the past decade. Statistically robust, flexible computational methods for binding site identification from high-throughput immunoprecipitation assays are largely lacking however.
Results: We introduce a method for site identification which provides four key advantages over previous methods: (i) it can be applied on all variations of CLIP and RIP-seq technologies, (ii) it accurately models the underlying read-count distributions, (iii) it allows external covariates, such as transcript abundance (which we demonstrate is highly correlated with read count) to inform the site identification process and (iv) it allows for direct comparison of site usage across cell types or conditions.
Availability and implementation: We have implemented our method in a software tool called Piranha. Source code and binaries, licensed under the GNU General Public License (version 3) are freely available for download from
Supplementary information: Supplementary data available at Bioinformatics online.
PMCID: PMC3509493  PMID: 23024010
2.  Before It Gets Started: Regulating Translation at the 5′ UTR 
Translation regulation plays important roles in both normal physiological conditions and diseases states. This regulation requires cis-regulatory elements located mostly in 5′ and 3′ UTRs and trans-regulatory factors (e.g., RNA binding proteins (RBPs)) which recognize specific RNA features and interact with the translation machinery to modulate its activity. In this paper, we discuss important aspects of 5′ UTR-mediated regulation by providing an overview of the characteristics and the function of the main elements present in this region, like uORF (upstream open reading frame), secondary structures, and RBPs binding motifs and different mechanisms of translation regulation and the impact they have on gene expression and human health when deregulated.
PMCID: PMC3368165  PMID: 22693426
3.  Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line 
We provide a large-scale dataset on absolute protein and matching mRNA concentrations from the human medulloblastoma cell line Daoy. The correlation between mRNA and protein concentrations is significant and positive (Rs=0.46, R2=0.29, P-value<2e16), although non-linear.Out of ∼200 tested sequence features, sequence length, frequency and properties of amino acids, as well as translation initiation-related features are the strongest individual correlates of protein abundance when accounting for variation in mRNA concentration.When integrating mRNA expression data and all sequence features into a non-parametric regression model (Multivariate Adaptive Regression Splines), we were able to explain up to 67% of the variation in protein concentrations. Half of the contributions were attributed to mRNA concentrations, the other half to sequence features relating to regulation of translation and protein degradation. The sequence features are primarily linked to the coding and 3′ untranslated region. To our knowledge, this is the most comprehensive predictive model of human protein concentrations achieved so far.
mRNA decay, translation regulation and protein degradation are essential parts of eukaryotic gene expression regulation (Hieronymus and Silver, 2004; Mata et al, 2005), which enable the dynamics of cellular systems and their responses to external and internal stimuli without having to rely exclusively on transcription regulation. The importance of these processes is emphasized by the generally low correlation between mRNA and protein concentrations. For many prokaryotic and eukaryotic organisms, <50% of variation in protein abundance variation is explained by variation in mRNA concentrations (de Sousa Abreu et al, 2009).
Given the plethora of regulatory mechanisms involved, most studies have focused so far on individual regulators and specific targets. Particularly in human, we currently lack system-wide, quantitative analyses that evaluate the relative contribution of regulatory elements encoded in the mRNA and protein sequence. Existing studies have been carried out only in bacteria and yeast (Nie et al, 2006; Brockmann et al, 2007; Tuller et al, 2007; Wu et al, 2008). Here, we present the first comprehensive analysis on the impact of translation and protein degradation on protein abundance variation in a human cell line. For this purpose, we experimentally measured absolute protein and mRNA concentrations in the Daoy medulloblastoma cell line, using shotgun proteomics and microarrays, respectively (Figure 1). These data comprise one of the largest such sets available today for human. We focused on sequence features that likely impact protein translation and protein degradation, including length, nucleotide composition, structure of the untranslated regions (UTRs), coding sequence, composition of the translation initiation site, presence of upstream open reading frames putative target sites of miRNAs, codon usage, amino-acid composition and protein degradation signals.
Three types of tests have been conducted: (a) we examined partial Spearman's rank correlation of numerical features (e.g. length) with protein concentration, accounting for variation in mRNA concentrations; (b) for numerical and categorical features (e.g. function), we compared two extreme populations with Welch's t-test and (c) using a Multivariate Adaptive Regression Splines model, we analyzed the combined contributions of mRNA expression and sequence features to protein abundance variation (Figure 1). To account for the non-linearity of many relationships, we use non-parametric approaches throughout the analysis.
We observed a significant positive correlation between mRNA and protein concentrations, larger than many previous measurements (de Sousa Abreu et al, 2009). We also show that the contribution of translation and protein degradation is at least as important as the contribution of mRNA transcription and stability to the abundance variation of the final protein products. Although variation in mRNA expression explains ∼25–30% of the variation in protein abundance, another 30–40% can be accounted for by characteristics of the sequences, which we identified in a comparative assessment of global correlates. Among these characteristics, sequence length, amino-acid frequencies and also nucleotide frequencies in the coding region are of strong influence (Figure 3A). Characteristics of the 3′UTR and of the 5′UTR, that is length, nucleotide composition and secondary structures, describe another part of the variation, leaving 33% expression variation unexplained. The unexplained fraction may be accounted for by mechanisms not considered in this analysis (e.g. regulation by RNA-binding proteins or gene-specific structural motifs), as well as expression and measurement noise.
Our combined model including mRNA concentration and sequence features can explain 67% of the variation of protein abundance in this system—and thus has the highest predictive power for human protein abundance achieved so far (Figure 3B).
Transcription, mRNA decay, translation and protein degradation are essential processes during eukaryotic gene expression, but their relative global contributions to steady-state protein concentrations in multi-cellular eukaryotes are largely unknown. Using measurements of absolute protein and mRNA abundances in cellular lysate from the human Daoy medulloblastoma cell line, we quantitatively evaluate the impact of mRNA concentration and sequence features implicated in translation and protein degradation on protein expression. Sequence features related to translation and protein degradation have an impact similar to that of mRNA abundance, and their combined contribution explains two-thirds of protein abundance variation. mRNA sequence lengths, amino-acid properties, upstream open reading frames and secondary structures in the 5′ untranslated region (UTR) were the strongest individual correlates of protein concentrations. In a combined model, characteristics of the coding region and the 3′UTR explained a larger proportion of protein abundance variation than characteristics of the 5′UTR. The absolute protein and mRNA concentration measurements for >1000 human genes described here represent one of the largest datasets currently available, and reveal both general trends and specific examples of post-transcriptional regulation.
PMCID: PMC2947365  PMID: 20739923
gene expression regulation; protein degradation; protein stability; translation
4.  Musashi1 modulates cell proliferation genes in the medulloblastoma cell line Daoy 
BMC Cancer  2008;8:280.
Musashi1 (Msi1) is an RNA binding protein with a central role during nervous system development and stem cell maintenance. High levels of Msi1 have been reported in several malignancies including brain tumors thereby associating Msi1 and cancer.
We used the human medulloblastoma cell line Daoy as model system in this study to knock down the expression of Msi1 and determine the effects upon soft agar growth and neurophere formation. Quantitative RT-PCR was conducted to evaluate the expression of cell proliferation, differentiation and survival genes in Msi1 depleted Daoy cells.
We observed that MSI1 expression was elevated in Daoy cells cultured as neurospheres compared to those grown as monolayer. These data indicated that Msi1 might be involved in regulating proliferation in cancer cells. Here we show that shRNA mediated Msi1 depletion in Daoy cells notably impaired their ability to form colonies in soft agar and to grow as neurospheres in culture. Moreover, differential expression of a group of Notch, Hedgehog and Wnt pathway related genes including MYCN, FOS, NOTCH2, SMO, CDKN1A, CCND2, CCND1, and DKK1, was also found in the Msi1 knockdown, demonstrating that Msi1 modulated the expression of a subset of cell proliferation, differentiation and survival genes in Daoy.
Our data suggested that Msi1 may promote cancer cell proliferation and survival as its loss seems to have a detrimental effect in the maintenance of medulloblastoma cancer cells. In this regard, Msi1 might be a positive regulator of tumor progression and a potential target for therapy.
PMCID: PMC2572071  PMID: 18826648
5.  Genomic Analyses Reveal Broad Impact of miR-137 on Genes Associated with Malignant Transformation and Neuronal Differentiation in Glioblastoma Cells 
PLoS ONE  2014;9(1):e85591.
miR-137 plays critical roles in the nervous system and tumor development; an increase in its expression is required for neuronal differentiation while its reduction is implicated in gliomagenesis. To evaluate the potential of miR-137 in glioblastoma therapy, we conducted genome-wide target mapping in glioblastoma cells by measuring the level of association between PABP and mRNAs in cells transfected with miR-137 mimics vs. controls via RIPSeq. Impact on mRNA levels was also measured by RNASeq. By combining the results of both experimental approaches, 1468 genes were found to be negatively impacted by miR-137 – among them, 595 (40%) contain miR-137 predicted sites. The most relevant targets include oncogenic proteins and key players in neurogenesis like c-KIT, YBX1, AKT2, CDC42, CDK6 and TGFβ2. Interestingly, we observed that several identified miR-137 targets are also predicted to be regulated by miR-124, miR-128 and miR-7, which are equally implicated in neuronal differentiation and gliomagenesis. We suggest that the concomitant increase of these four miRNAs in neuronal stem cells or their repression in tumor cells could produce a robust regulatory effect with major consequences to neuronal differentiation and tumorigenesis.
PMCID: PMC3899048  PMID: 24465609

