Increasing evidence has indicated that long non-coding RNAs (lncRNAs) are implicated in and associated with many complex human diseases. Despite of the accumulation of lncRNA-disease associations, only a few studies had studied the roles of these associations in pathogenesis. In this paper, we investigated lncRNA-disease associations from a network view to understand the contribution of these lncRNAs to complex diseases. Specifically, we studied both the properties of the diseases in which the lncRNAs were implicated, and that of the lncRNAs associated with complex diseases. Regarding the fact that protein coding genes and lncRNAs are involved in human diseases, we constructed a coding-non-coding gene-disease bipartite network based on known associations between diseases and disease-causing genes. We then applied a propagation algorithm to uncover the hidden lncRNA-disease associations in this network. The algorithm was evaluated by leave-one-out cross validation on 103 diseases in which at least two genes were known to be involved, and achieved an AUC of 0.7881. Our algorithm successfully predicted 768 potential lncRNA-disease associations between 66 lncRNAs and 193 diseases. Furthermore, our results for Alzheimer's disease, pancreatic cancer, and gastric cancer were verified by other independent studies.
Formalin fixed paraffin-embedded (FFPE) tumor specimens are the conventionally archived material in clinical practice, representing an invaluable tissue source for biomarkers development, validation and routine implementation. For many prospective clinical trials, this material has been collected allowing for a prospective-retrospective study design which represents a successful strategy to define clinical utility for candidate markers. Gene expression data can be obtained even from FFPE specimens with the broadly used Affymetrix HG-U133 Plus 2.0 microarray platform. Nevertheless, important major discrepancies remain in expression data obtained from FFPE compared to fresh-frozen samples, prompting the need for appropriate data processing which could help to obtain more consistent results in downstream analyses. In a publicly available dataset of matched frozen and FFPE expression data, the performances of different normalization methods and specifically designed Chip Description Files (CDFs) were compared. The use of an alternative CDFs together with fRMA normalization significantly improved frozen-FFPE sample correlations, frozen-FFPE probeset correlations and agreement of differential analysis between different tumor subtypes. The relevance of our optimized data processing was assessed and validated using two independent datasets. In this study we demonstrated that an appropriate data processing can significantly improve the reliability of gene expression data derived from FFPE tissues using the standard Affymetrix platform. Tools for the implementation of our data processing algorithm are made publicly available at http://www.biocut.unito.it/cdf-ffpe/.
In this paper we perform a genome-wide analysis of H. sapiens promoters. To this aim, we developed and combined two mathematical methods that allow us to (i) classify promoters into groups characterized by specific global structural features, and (ii) recover, in full generality, any regular sequence in the different classes of promoters. One of the main findings of this analysis is that H. sapiens promoters can be classified into three main groups. Two of them are distinguished by the prevalence of weak or strong nucleotides and are characterized by short compositionally biased sequences, while the most frequent regular sequences in the third group are strongly correlated with transposons. Taking advantage of the generality of these mathematical procedures, we have compared the promoter database of H. sapiens with those of other species. We have found that the above-mentioned features characterize also the evolutionary content appearing in mammalian promoters, at variance with ancestral species in the phylogenetic tree, that exhibit a definitely lower level of differentiation among promoters.
MicroRNAs are single-stranded non-coding RNAs that simultaneously down-modulate the expression of multiple genes post-transcriptionally by binding to the 3′UTRs of target mRNAs. Here we used computational methods to predict microRNAs relevant in breast cancer progression. Specifically, we applied different microRNA target prediction algorithms to various groups of differentially expressed protein-coding genes obtained from four breast cancer datasets. Six potential candidates were identified, among them miR-223, previously described to be highly expressed in the tumor microenvironment and known to be actively transferred into breast cancer cells. To investigate the function of miR-223 in tumorigenesis and to define its molecular mechanism, we overexpressed miR-223 in breast cancer cells in a transient or stable manner. Alternatively we overexpressed miR-223 in mouse embryonic fibroblasts or HEK293 cells and used their conditioned medium to treat tumor cells. With both approaches, we obtained elevated levels of miR-223 in tumor cells and observed decreased migration, increased cell death in anoikis conditions and augmented sensitivity to chemotherapy but no effect on adhesion and proliferation. The analysis of miR-223 predicted targets revealed enrichment in cell death and survival-related genes and in pathways frequently altered in breast cancer. Among these genes, we showed that protein levels for STAT5A, ITGA3 and NRAS were modulated by miR-223. In addition, we proved that STAT5A is a direct miR-223 target and highlighted a possible correlation between miR-223 and STAT5A in migration and chemotherapy response. Our investigation revealed that a computational analysis of cancer gene expression datasets can be a relevant tool to identify microRNAs involved in cancer progression and that miR-223 has a prominent role in breast malignancy that could potentially be exploited therapeutically.
Accumulating evidence shows a tight link between inflammation and cancer. However, comprehensive identification of pivotal transcription factors (i.e., core TFs) mediating the dysregulated links remains challenging, mainly due to a lack of samples that can effectively reflect the connections between inflammation and tumorigenesis. Here, we constructed a series of TF-mediated regulatory networks from a large compendium of expression profiling of normal colonic tissues, inflammatory bowel diseases (IBDs) and colorectal cancer (CRC), which contains 1201 samples in total, and then proposed a network-based approach to characterize potential links bridging inflammation and cancer. For this purpose, we computed significantly dysregulated relationships between inflammation and their linked cancer networks, and then 24 core TFs with their dysregulated genes were identified. Collectively, our approach provides us with quite important insight into inflammation-associated tumorigenesis in colorectal cancer, which could also be applied to identify functionally dysregulated relationships mediating the links between other different disease phenotypes.
The development of new therapies for orphan genetic diseases represents an extremely important medical and social challenge. Drug repositioning, i.e. finding new indications for approved drugs, could be one of the most cost- and time-effective strategies to cope with this problem, at least in a subset of cases. Therefore, many computational approaches based on the analysis of high throughput gene expression data have so far been proposed to reposition available drugs. However, most of these methods require gene expression profiles directly relevant to the pathologic conditions under study, such as those obtained from patient cells and/or from suitable experimental models. In this work we have developed a new approach for drug repositioning, based on identifying known drug targets showing conserved anti-correlated expression profiles with human disease genes, which is completely independent from the availability of ‘ad hoc’ gene expression data-sets.
By analyzing available data, we provide evidence that the genes displaying conserved anti-correlation with drug targets are antagonistically modulated in their expression by treatment with the relevant drugs. We then identified clusters of genes associated to similar phenotypes and showing conserved anticorrelation with drug targets. On this basis, we generated a list of potential candidate drug-disease associations. Importantly, we show that some of the proposed associations are already supported by independent experimental evidence.
Our results support the hypothesis that the identification of gene clusters showing conserved anticorrelation with drug targets can be an effective method for drug repositioning and provide a wide list of new potential drug-disease associations for experimental validation.
In pluripotent stem cells, there is increasing evidence for crosstalk between post-transcriptional and transcriptional networks, offering multifold steps at which pluripotency can be controlled. In addition to well-studied transcription factors, chromatin modifiers and miRNAs, RNA-binding proteins are emerging as fundamental players in pluripotency regulation. Here, we report a new role for the RNA-binding protein ESRP1 in the control of pluripotency. Knockdown of Esrp1 in mouse embryonic stem cells induces, other than the well-documented epithelial to mesenchymal-like state, also an increase in expression of the core transcription factors Oct4, Nanog and Sox2, thereby enhancing self-renewal of these cells. Esrp1-depleted embryonic stem cells displayed impaired early differentiation in vitro and formed larger teratomas in vivo when compared to control embryonic stem cells. We also show that ESRP1 binds to Oct4 and Sox2 mRNAs and decreases their polysomal loading. ESRP1 thus acts as a physiological regulator of the finely-tuned balance between self-renewal and commitment to a restricted developmental fate. Importantly, both mouse and human epithelial stem cells highly express ESRP1, pinpointing the importance of this RNA-binding protein in stem cell biology.
RNAseq and microarray methods are frequently used to measure gene expression level. While similar in purpose, there are fundamental differences between the two technologies. Here, we present the largest comparative study between microarray and RNAseq methods to date using The Cancer Genome Atlas (TCGA) data. We found high correlations between expression data obtained from the Affymetrix one-channel microarray and RNAseq (Spearman correlations coefficients of ∼0.8). We also observed that the low abundance genes had poorer correlations between microarray and RNAseq data than high abundance genes. As expected, due to measurement and normalization differences, Agilent two-channel microarray and RNAseq data were poorly correlated (Spearman correlations coefficients of only ∼0.2). By examining the differentially expressed genes between tumor and normal samples we observed reasonable concordance in directionality between Agilent two-channel microarray and RNAseq data, although a small group of genes were found to have expression changes reported in opposite directions using these two technologies. Overall, RNAseq produces comparable results to microarray technologies in term of expression profiling. The RNAseq normalization methods RPKM and RSEM produce similar results on the gene level and reasonably concordant results on the exon level. Longer exons tended to have better concordance between the two normalization methods than shorter exons.
Identifying subspace gene clusters from the gene expression data is useful for discovering novel functional gene interactions. In this paper, we propose to use low-rank representation (LRR) to identify the subspace gene clusters from microarray data. LRR seeks the lowest-rank representation among all the candidates that can represent the genes as linear combinations of the bases in the dataset. The clusters can be extracted based on the block diagonal representation matrix obtained using LRR, and they can well capture the intrinsic patterns of genes with similar functions. Meanwhile, the parameter of LRR can balance the effect of noise so that the method is capable of extracting useful information from the data with high level of background noise. Compared with traditional methods, our approach can identify genes with similar functions yet without similar expression profiles. Also, it could assign one gene into different clusters. Moreover, our method is robust to the noise and can identify more biologically relevant gene clusters. When applied to three public datasets, the results show that the LRR based method is superior to existing methods for identifying subspace gene clusters.
Genomic copy number alterations are common in cancer. Finding the genes causally implicated in oncogenesis is challenging because the gain or loss of a chromosomal region may affect a few key driver genes and many passengers. Integrative analyses have opened new vistas for addressing this issue. One approach is to identify genes with frequent copy number alterations and corresponding changes in expression. Several methods also analyse effects of transcriptional changes on known pathways. Here, we propose a method that analyses in-cis correlated genes for evidence of in-trans association to biological processes, with no bias towards processes of a particular type or function. The method aims to identify cis-regulated genes for which the expression correlation to other genes provides further evidence of a network-perturbing role in cancer. The proposed unsupervised approach involves a sequence of statistical tests to systematically narrow down the list of relevant genes, based on integrative analysis of copy number and gene expression data. A novel adjustment method handles confounding effects of co-occurring copy number aberrations, potentially a large source of false positives in such studies. Applying the method to whole-genome copy number and expression data from 100 primary breast carcinomas, 6373 genes were identified as commonly aberrant, 578 were highly in-cis correlated, and 56 were in addition associated in-trans to biological processes. Among these in-trans process associated and cis-correlated (iPAC) genes, 28% have previously been reported as breast cancer associated, and 64% as cancer associated. By combining statistical evidence from three separate subanalyses that focus respectively on copy number, gene expression and the combination of the two, the proposed method identifies several known and novel cancer driver candidates. Validation in an independent data set supports the conclusion that the method identifies genes implicated in cancer.
The Dlx and Msx homeodomain transcription factors play important roles in the control of limb development. The combined disruption of Msx1 and Msx2, as well as that of Dlx5 and Dlx6, lead to limb patterning defects with anomalies in digit number and shape. Msx1;Msx2 double mutants are characterized by the loss of derivatives of the anterior limb mesoderm which is not observed in either of the simple mutants. Dlx5;Dlx6 double mutants exhibit hindlimb ectrodactyly. While the morphogenetic action of Msx genes seems to involve the BMP molecules, the mode of action of Dlx genes still remains elusive. Here, examining the limb phenotypes of combined Dlx and Msx mutants we reveal a new Dlx-Msx regulatory loop directly involving BMPs. In Msx1;Dlx5;Dlx6 triple mutant mice (TKO), beside the expected ectrodactyly, we also observe the hallmark morphological anomalies of Msx1;Msx2 double mutants suggesting an epistatic role of Dlx5 and Dlx6 over Msx2. In Msx2;Dlx5;Dlx6 TKO mice we only observe an aggravation of the ectrodactyly defect without changes in the number of the individual components of the limb. Using a combination of qPCR, ChIP and bioinformatic analyses, we identify two Dlx/Msx regulatory pathways: 1) in the anterior limb mesoderm a non-cell autonomous Msx-Dlx regulatory loop involves BMP molecules through the AER and 2) in AER cells and, at later stages, in the limb mesoderm the regulation of Msx2 by Dlx5 and Dlx6 occurs also cell autonomously. These data bring new elements to decipher the complex AER-mesoderm dialogue that takes place during limb development and provide clues to understanding the etiology of congenital limb malformations.
During embryonic development, immature neurons in the olfactory epithelium (OE) extend axons through the nasal mesenchyme, to contact projection neurons in the olfactory bulb. Axon navigation is accompanied by migration of the GnRH+ neurons, which enter the anterior forebrain and home in the septo-hypothalamic area. This process can be interrupted at various points and lead to the onset of the Kallmann syndrome (KS), a disorder characterized by anosmia and central hypogonadotropic hypogonadism. Several genes has been identified in human and mice that cause KS or a KS-like phenotype. In mice a set of transcription factors appears to be required for olfactory connectivity and GnRH neuron migration; thus we explored the transcriptional network underlying this developmental process by profiling the OE and the adjacent mesenchyme at three embryonic ages. We also profiled the OE from embryos null for Dlx5, a homeogene that causes a KS-like phenotype when deleted. We identified 20 interesting genes belonging to the following categories: (1) transmembrane adhesion/receptor, (2) axon-glia interaction, (3) scaffold/adapter for signaling, (4) synaptic proteins. We tested some of them in zebrafish embryos: the depletion of five (of six) Dlx5 targets affected axonal extension and targeting, while three (of three) affected GnRH neuron position and neurite organization. Thus, we confirmed the importance of cell–cell and cell-matrix interactions and identified new molecules needed for olfactory connection and GnRH neuron migration. Using available and newly generated data, we predicted/prioritized putative KS-disease genes, by building conserved co-expression networks with all known disease genes in human and mouse. The results show the overall validity of approaches based on high-throughput data and predictive bioinformatics to identify genes potentially relevant for the molecular pathogenesis of KS. A number of candidate will be discussed, that should be tested in future mutation screens.
olfactory development; GnRH neuron; Kallmann syndrome; extracellular matrix; transcription profiling; disease gene prediction
Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.
Gene coexpression relationships that are phylogenetically conserved between human and mouse have been shown to provide important clues about gene function that can be efficiently used to identify promising candidate genes for human hereditary disorders. In the past, such approaches have considered mostly generic gene expression profiles that cover multiple tissues and organs. The individual genes of multicellular organisms, however, can participate in different transcriptional programs, operating at scales as different as single-cell types, tissues, organs, body regions or the entire organism. Therefore, systematic analysis of tissue-specific coexpression could be, in principle, a very powerful strategy to dissect those functional relationships among genes that emerge only in particular tissues or organs. In this report, we show that, in fact, conserved coexpression as determined from tissue-specific and condition-specific data sets can predict many functional relationships that are not detected by analyzing heterogeneous microarray data sets. More importantly, we find that, when combined with disease networks, the simultaneous use of both generic (multi-tissue) and tissue-specific conserved coexpression allows a more efficient prediction of human disease genes than the use of generic conserved coexpression alone. Using this strategy, we were able to identify high-probability candidates for 238 orphan disease loci. We provide proof of concept that this combined use of generic and tissue-specific conserved coexpression can be very useful to prioritize the mutational candidates obtained from deep-sequencing projects, even in the case of genetic disorders as heterogeneous as XLMR.
disease-gene prediction; functional annotation; transcriptome; phenome
Here we demonstrate that protein-coding RNA transcripts can crosstalk by competing for common microRNAs, with microRNA response elements as the foundation of this interaction. We have termed such RNA transcripts as competing endogenous RNAs (ceRNAs). We tested this hypothesis in the context of PTEN, a key tumor suppressor whose abundance determines critical outcomes in tumorigenesis. By a combined computational and experimental approach, we identified and validated endogenous protein-coding transcripts that regulate PTEN, antagonize PI3K/AKT signaling and possess growth and tumor suppressive properties. Notably, we also show that these genes display concordant expression patterns with PTEN and copy number loss in cancers. Our study presents a road map for the prediction and validation of ceRNA activity and networks, and thus imparts a trans-regulatory function to protein-coding mRNAs.
We recently proposed that competitive endogenous RNAs (ceRNAs) sequester microRNAs to regulate mRNA transcripts containing common microRNA recognition elements (MREs). However, the functional role of ceRNAs in cancer remains unknown. Loss of PTEN, a tumor suppressor regulated by ceRNA activity, frequently occurs in melanoma. Here, we report the discovery of significant enrichment of putative PTEN ceRNAs among genes whose loss accelerates tumorigenesis following Sleeping Beauty insertional mutagenesis in a mouse model of melanoma. We validated several putative PTEN ceRNAs and further characterized one, the ZEB2 transcript. We show that ZEB2 modulates PTEN protein levels in a microRNA-dependent, protein coding-independent manner. Attenuation of ZEB2 expression activates the PI3K/AKT pathway, enhances cell transformation, and commonly occurs in human melanomas and other cancers expressing low PTEN levels. Our study genetically identifies multiple putative microRNA decoys for PTEN, validates ZEB2 mRNA as a bona fide PTEN ceRNA, and demonstrates that abrogated ZEB2 expression cooperates with BRAFV600E to promote melanomagenesis.
We sought exonic transcriptional regulatory elements by shotgun cloning human cDNA fragments into luciferase reporter vectors and measuring the resulting expression levels in liver cells. We uncovered seven regulatory elements within coding regions and three within 3' untranslated regions (UTRs). Two of the putative regulatory elements were enhancers and eight were silencers. The regulatory elements were generally but not consistently evolutionarily conserved and also showed a trend toward decreased population diversity. Furthermore, the exonic regulatory elements were enriched in known transcription factor binding sites (TFBSs) and were associated with several histone modifications and transcriptionally relevant chromatin. Evidence was obtained for bidirectional cis-regulation of a coding region element within a tubulin gene, TUBA1B, by the transcription factors PPARA and RORA. We estimate that hundreds of exonic transcriptional regulatory elements exist, an unexpected finding that highlights a surprising multi-functionality of sequences in the human genome.
Among thousands of long non-coding RNAs (lncRNAs) only a small subset is functionally characterized and the functional annotation of lncRNAs on the genomic scale remains inadequate. In this study we computationally characterized two functionally different parts of human lncRNAs transcriptome based on their ability to bind the polycomb repressive complex, PRC2. This classification is enabled by the fact that while all lncRNAs constitute a diverse set of sequences, the classes of PRC2-binding and PRC2 non-binding lncRNAs possess characteristic combinations of sequence-structure patterns and, therefore, can be separated within the feature space. Based on the specific combination of features, we built several machine-learning classifiers and identified the SVM-based classifier as the best performing. We further showed that the SVM-based classifier is able to generalize on the independent data sets. We observed that this classifier, trained on the human lncRNAs, can predict up to 59.4% of PRC2-binding lncRNAs in mice. This suggests that, despite the low degree of sequence conservation, many lncRNAs play functionally conserved biological roles.
Data normalization is a crucial preliminary step in analyzing genomic datasets. The goal of normalization is to remove global variation to make readings across different experiments comparable. In addition, most genomic loci have non-uniform sensitivity to any given assay because of variation in local sequence properties. In microarray experiments, this non-uniform sensitivity is due to different DNA hybridization and cross-hybridization efficiencies, known as the probe effect. In this paper we introduce a new scheme, called Group Normalization (GN), to remove both global and local biases in one integrated step, whereby we determine the normalized probe signal by finding a set of reference probes with similar responses. Compared to conventional normalization methods such as Quantile normalization and physically motivated probe effect models, our proposed method is general in the sense that it does not require the assumption that the underlying signal distribution be identical for the treatment and control, and is flexible enough to correct for nonlinear and higher order probe effects. The Group Normalization algorithm is computationally efficient and easy to implement. We also describe a variant of the Group Normalization algorithm, called Cross Normalization, which efficiently amplifies biologically relevant differences between any two genomic datasets.
MicroRNAs (miRNAs) have emerged as fundamental regulators that silence gene expression at the post-transcriptional and translational levels. The identification of their targets is a major challenge to elucidate the regulated biological processes. The overall effect of miRNA is reflected on target mRNA expression, suggesting the design of new investigative methods based on high-throughput experimental data such as miRNA and transcriptome profiles. We propose a novel statistical measure of non-linear dependence between miRNA and mRNA expression, in order to infer miRNA-target interactions. This approach, which we name antagonism pattern detection, is based on the statistical recognition of a triangular-shaped pattern in miRNA-target expression profiles. This pattern is observed in miRNA-target expression measurements since their simultaneously elevated expression is statistically under-represented in the case of miRNA silencing effect. The proposed method enables miRNA target prediction to strongly rely on cellular context and physiological conditions reflected by expression data. The procedure has been assessed on synthetic datasets and tested on a set of real positive controls. Then it has been applied to analyze expression data from Ewing’s sarcoma patients. The antagonism relationship is evaluated as a good indicator of real miRNA-target biological interaction. The predicted targets are consistently enriched for miRNA binding site motifs in their 3′UTR. Moreover, we reveal sets of predicted targets for each miRNA sharing important biological function. The procedure allows us to infer crucial miRNA regulators and their potential targets in Ewing’s sarcoma disease. It can be considered as a valid statistical approach to discover new insights in the miRNA regulatory mechanisms.
A universal cancer biomarker candidate for diagnosis is supposed to distinguish, within a broad range of tumors, between healthy and diseased patients. Recently published studies have explored the universal usefulness of some biomarkers in human tumors. In this study, we present an integrative approach to search for potential common cancer biomarkers. Using the TFactS web-tool with a catalogue of experimentally established gene regulations, we could predict transcription factors (TFs) regulated in 305 different human cancer cell lines covering a large panel of tumor types. We also identified chromosomal regions having significant copy number variation (CNV) in these cell lines. Within the scope of TFactS catalogue, 88 TFs whose activity status were explained by their gene expressions and CNVs were identified. Their minimal connected network (MCN) of protein-protein interactions forms a significant module within the human curated TF proteome. Functional analysis of the proteins included in this MCN revealed enrichment in cancer pathways as well as inflammation. The ten most central proteins in MCN are TFs that trans-regulate 157 known genes encoding secreted and transmembrane proteins. In publicly available collections of gene expression data from 8,525 patient tissues, 86 genes were differentially regulated in cancer compared to inflammatory diseases and controls. From TCGA cancer gene expression data sets, 50 genes were significantly associated to patient survival in at least one tumor type. Enrichment analysis shows that these genes mechanistically interact in common cancer pathways. Among these cancer biomarker candidates, TFRC, MET and VEGFA are commonly amplified genes in tumors and their encoded proteins stained positive in more than 80% of malignancies from public databases. They are linked to angiogenesis and hypoxia, which are common in cancer. They could be interesting for further investigations in cancer diagnostic strategies.
Expression levels of mRNAs are among other factors regulated by microRNAs. A particular microRNA can bind specifically to several target mRNAs and lead to their degradation. Expression levels of both, mRNAs and microRNAs, can be obtained by microarray experiments. In order to increase the power of detecting microRNAs that are differentially expressed between two different groups of samples, we incorporate expression levels of their related target gene sets. Group effects are determined individually for each microRNA, and by enrichment tests and global tests for target gene sets. The resulting lists of p-values from individual and set-wise testing are combined by means of meta analysis. We propose a new approach to connect microRNA-wise and gene set-wise information by means of p-value combination as often used in meta-analysis. In this context, we evaluate the usefulness of different approaches of gene set tests. In a simulation study we reveal that our combination approach is more powerful than microRNA-wise testing alone. Furthermore, we show that combining microRNA-wise results with ‘competitive’ gene set tests maintains a pre-specified false discovery rate. In contrast, a combination with ‘self-contained’ gene set tests can harm the false discovery rate, particularly when gene sets are not disjunct.
MicroRNA is a set of small RNA molecules mediating gene expression at post-transcriptional/translational levels. Most of well-established high throughput discovery platforms, such as microarray, real time quantitative PCR, and sequencing, have been adapted to study microRNA in various human diseases. The total number of microRNAs in humans is approximately 1,800, which challenges some analytical methodologies requiring a large number of entries. Unlike messenger RNA, the majority of microRNA (60%) maintains relatively low abundance in the cells. When analyzed using microarray, the signals of these low-expressed microRNAs are influenced by other non-specific signals including the background noise. It is crucial to distinguish the true microRNA signals from measurement errors in microRNA array data analysis. In this study, we propose a novel measurement error model-based normalization method and differentially-expressed microRNA detection method for microRNA profiling data acquired from locked nucleic acids (LNA) microRNA array. Compared with some existing methods, the proposed method significantly improves the detection among low-expressed microRNAs when assessed by quantitative real-time PCR assay.
Trabectedin, a new antitumor compound originally derived from a marine tunicate, is clinically effective in soft tissue sarcoma. The drug has shown a high selectivity for myxoid liposarcoma, characterized by the translocation t(12;16)(q13; p11) leading to the expression of FUS-CHOP fusion gene. Trabectedin appears to act interfering with mechanisms of transcription regulation. In particular, the transactivating activity of FUS-CHOP was found to be impaired by trabectedin treatment. Even after prolonged response resistance occurs and thus it is important to elucidate the mechanisms of resistance to trabectedin. To this end we developed and characterized a myxoid liposarcoma cell line resistant to trabectedin (402-91/ET), obtained by exposing the parental 402-91 cell line to stepwise increases in drug concentration. The aim of this study was to compare mRNAs, miRNAs and proteins profiles of 402-91 and 402-91/ET cells through a systems biology approach. We identified 3,083 genes, 47 miRNAs and 336 proteins differentially expressed between 402-91 and 402-91/ET cell lines. Interestingly three miRNAs among those differentially expressed, miR-130a, miR-21 and miR-7, harbored CHOP binding sites in their promoter region. We used computational approaches to integrate the three regulatory layers and to generate a molecular map describing the altered circuits in sensitive and resistant cell lines. By combining transcriptomic and proteomic data, we reconstructed two different networks, i.e. apoptosis and cell cycle regulation, that could play a key role in modulating trabectedin resistance. This approach highlights the central role of genes such as CCDN1, RB1, E2F4, TNF, CDKN1C and ABL1 in both pre- and post-transcriptional regulatory network. The validation of these results in in vivo models might be clinically relevant to stratify myxoid liposarcoma patients with different sensitivity to trabectedin treatment.
Various methods of reconstructing transcriptional regulatory networks infer transcriptional regulatory interactions (TRIs) between strongly coexpressed gene pairs (as determined from microarray experiments measuring mRNA levels). Alternatively, however, the coexpression of two genes might imply that they are coregulated by one or more transcription factors (TFs), and do not necessarily share a direct regulatory interaction. We explore whether and under what circumstances gene pairs with a high degree of coexpression are more likely to indicate TRIs, coregulation or both. Here we use established TRIs in combination with microarray expression data from both Escherichia coli (a prokaryote) and Saccharomyces cerevisiae (a eukaryote) to assess the accuracy of predictions of coregulated gene pairs and TRIs from coexpressed gene pairs. We find that coexpressed gene pairs are more likely to indicate coregulation than TRIs for Saccharomyces cerevisiae, but the incidence of TRIs in highly coexpressed gene pairs is higher for Escherichia coli. The data processing inequality (DPI) has previously been applied for the inference of TRIs. We consider the case where a transcription factor gene is known to regulate two genes (one of which is a transcription factor gene) that are known not to regulate one another. According to the DPI, the non-interacting gene pairs should have the smallest mutual information among all pairs in the triplets. While this is sometimes the case for Escherichia coli, we find that it is almost always not the case for Saccharomyces cerevisiae. This brings into question the usefulness of the DPI sometimes employed to infer TRIs from expression data. Finally, we observe that when a TF gene is known to regulate two other genes, it is rarely the case that one regulatory interaction is positively correlated and the other interaction is negatively correlated. Typically both are either positively or negatively correlated.