The development of new therapies for orphan genetic diseases represents an extremely important medical and social challenge. Drug repositioning, i.e. finding new indications for approved drugs, could be one of the most cost- and time-effective strategies to cope with this problem, at least in a subset of cases. Therefore, many computational approaches based on the analysis of high throughput gene expression data have so far been proposed to reposition available drugs. However, most of these methods require gene expression profiles directly relevant to the pathologic conditions under study, such as those obtained from patient cells and/or from suitable experimental models. In this work we have developed a new approach for drug repositioning, based on identifying known drug targets showing conserved anti-correlated expression profiles with human disease genes, which is completely independent from the availability of ‘ad hoc’ gene expression data-sets.
By analyzing available data, we provide evidence that the genes displaying conserved anti-correlation with drug targets are antagonistically modulated in their expression by treatment with the relevant drugs. We then identified clusters of genes associated to similar phenotypes and showing conserved anticorrelation with drug targets. On this basis, we generated a list of potential candidate drug-disease associations. Importantly, we show that some of the proposed associations are already supported by independent experimental evidence.
Our results support the hypothesis that the identification of gene clusters showing conserved anticorrelation with drug targets can be an effective method for drug repositioning and provide a wide list of new potential drug-disease associations for experimental validation.
In pluripotent stem cells, there is increasing evidence for crosstalk between post-transcriptional and transcriptional networks, offering multifold steps at which pluripotency can be controlled. In addition to well-studied transcription factors, chromatin modifiers and miRNAs, RNA-binding proteins are emerging as fundamental players in pluripotency regulation. Here, we report a new role for the RNA-binding protein ESRP1 in the control of pluripotency. Knockdown of Esrp1 in mouse embryonic stem cells induces, other than the well-documented epithelial to mesenchymal-like state, also an increase in expression of the core transcription factors Oct4, Nanog and Sox2, thereby enhancing self-renewal of these cells. Esrp1-depleted embryonic stem cells displayed impaired early differentiation in vitro and formed larger teratomas in vivo when compared to control embryonic stem cells. We also show that ESRP1 binds to Oct4 and Sox2 mRNAs and decreases their polysomal loading. ESRP1 thus acts as a physiological regulator of the finely-tuned balance between self-renewal and commitment to a restricted developmental fate. Importantly, both mouse and human epithelial stem cells highly express ESRP1, pinpointing the importance of this RNA-binding protein in stem cell biology.
RNAseq and microarray methods are frequently used to measure gene expression level. While similar in purpose, there are fundamental differences between the two technologies. Here, we present the largest comparative study between microarray and RNAseq methods to date using The Cancer Genome Atlas (TCGA) data. We found high correlations between expression data obtained from the Affymetrix one-channel microarray and RNAseq (Spearman correlations coefficients of ∼0.8). We also observed that the low abundance genes had poorer correlations between microarray and RNAseq data than high abundance genes. As expected, due to measurement and normalization differences, Agilent two-channel microarray and RNAseq data were poorly correlated (Spearman correlations coefficients of only ∼0.2). By examining the differentially expressed genes between tumor and normal samples we observed reasonable concordance in directionality between Agilent two-channel microarray and RNAseq data, although a small group of genes were found to have expression changes reported in opposite directions using these two technologies. Overall, RNAseq produces comparable results to microarray technologies in term of expression profiling. The RNAseq normalization methods RPKM and RSEM produce similar results on the gene level and reasonably concordant results on the exon level. Longer exons tended to have better concordance between the two normalization methods than shorter exons.
Identifying subspace gene clusters from the gene expression data is useful for discovering novel functional gene interactions. In this paper, we propose to use low-rank representation (LRR) to identify the subspace gene clusters from microarray data. LRR seeks the lowest-rank representation among all the candidates that can represent the genes as linear combinations of the bases in the dataset. The clusters can be extracted based on the block diagonal representation matrix obtained using LRR, and they can well capture the intrinsic patterns of genes with similar functions. Meanwhile, the parameter of LRR can balance the effect of noise so that the method is capable of extracting useful information from the data with high level of background noise. Compared with traditional methods, our approach can identify genes with similar functions yet without similar expression profiles. Also, it could assign one gene into different clusters. Moreover, our method is robust to the noise and can identify more biologically relevant gene clusters. When applied to three public datasets, the results show that the LRR based method is superior to existing methods for identifying subspace gene clusters.
Genomic copy number alterations are common in cancer. Finding the genes causally implicated in oncogenesis is challenging because the gain or loss of a chromosomal region may affect a few key driver genes and many passengers. Integrative analyses have opened new vistas for addressing this issue. One approach is to identify genes with frequent copy number alterations and corresponding changes in expression. Several methods also analyse effects of transcriptional changes on known pathways. Here, we propose a method that analyses in-cis correlated genes for evidence of in-trans association to biological processes, with no bias towards processes of a particular type or function. The method aims to identify cis-regulated genes for which the expression correlation to other genes provides further evidence of a network-perturbing role in cancer. The proposed unsupervised approach involves a sequence of statistical tests to systematically narrow down the list of relevant genes, based on integrative analysis of copy number and gene expression data. A novel adjustment method handles confounding effects of co-occurring copy number aberrations, potentially a large source of false positives in such studies. Applying the method to whole-genome copy number and expression data from 100 primary breast carcinomas, 6373 genes were identified as commonly aberrant, 578 were highly in-cis correlated, and 56 were in addition associated in-trans to biological processes. Among these in-trans process associated and cis-correlated (iPAC) genes, 28% have previously been reported as breast cancer associated, and 64% as cancer associated. By combining statistical evidence from three separate subanalyses that focus respectively on copy number, gene expression and the combination of the two, the proposed method identifies several known and novel cancer driver candidates. Validation in an independent data set supports the conclusion that the method identifies genes implicated in cancer.
The Dlx and Msx homeodomain transcription factors play important roles in the control of limb development. The combined disruption of Msx1 and Msx2, as well as that of Dlx5 and Dlx6, lead to limb patterning defects with anomalies in digit number and shape. Msx1;Msx2 double mutants are characterized by the loss of derivatives of the anterior limb mesoderm which is not observed in either of the simple mutants. Dlx5;Dlx6 double mutants exhibit hindlimb ectrodactyly. While the morphogenetic action of Msx genes seems to involve the BMP molecules, the mode of action of Dlx genes still remains elusive. Here, examining the limb phenotypes of combined Dlx and Msx mutants we reveal a new Dlx-Msx regulatory loop directly involving BMPs. In Msx1;Dlx5;Dlx6 triple mutant mice (TKO), beside the expected ectrodactyly, we also observe the hallmark morphological anomalies of Msx1;Msx2 double mutants suggesting an epistatic role of Dlx5 and Dlx6 over Msx2. In Msx2;Dlx5;Dlx6 TKO mice we only observe an aggravation of the ectrodactyly defect without changes in the number of the individual components of the limb. Using a combination of qPCR, ChIP and bioinformatic analyses, we identify two Dlx/Msx regulatory pathways: 1) in the anterior limb mesoderm a non-cell autonomous Msx-Dlx regulatory loop involves BMP molecules through the AER and 2) in AER cells and, at later stages, in the limb mesoderm the regulation of Msx2 by Dlx5 and Dlx6 occurs also cell autonomously. These data bring new elements to decipher the complex AER-mesoderm dialogue that takes place during limb development and provide clues to understanding the etiology of congenital limb malformations.
Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.
Gene coexpression relationships that are phylogenetically conserved between human and mouse have been shown to provide important clues about gene function that can be efficiently used to identify promising candidate genes for human hereditary disorders. In the past, such approaches have considered mostly generic gene expression profiles that cover multiple tissues and organs. The individual genes of multicellular organisms, however, can participate in different transcriptional programs, operating at scales as different as single-cell types, tissues, organs, body regions or the entire organism. Therefore, systematic analysis of tissue-specific coexpression could be, in principle, a very powerful strategy to dissect those functional relationships among genes that emerge only in particular tissues or organs. In this report, we show that, in fact, conserved coexpression as determined from tissue-specific and condition-specific data sets can predict many functional relationships that are not detected by analyzing heterogeneous microarray data sets. More importantly, we find that, when combined with disease networks, the simultaneous use of both generic (multi-tissue) and tissue-specific conserved coexpression allows a more efficient prediction of human disease genes than the use of generic conserved coexpression alone. Using this strategy, we were able to identify high-probability candidates for 238 orphan disease loci. We provide proof of concept that this combined use of generic and tissue-specific conserved coexpression can be very useful to prioritize the mutational candidates obtained from deep-sequencing projects, even in the case of genetic disorders as heterogeneous as XLMR.
disease-gene prediction; functional annotation; transcriptome; phenome
Here we demonstrate that protein-coding RNA transcripts can crosstalk by competing for common microRNAs, with microRNA response elements as the foundation of this interaction. We have termed such RNA transcripts as competing endogenous RNAs (ceRNAs). We tested this hypothesis in the context of PTEN, a key tumor suppressor whose abundance determines critical outcomes in tumorigenesis. By a combined computational and experimental approach, we identified and validated endogenous protein-coding transcripts that regulate PTEN, antagonize PI3K/AKT signaling and possess growth and tumor suppressive properties. Notably, we also show that these genes display concordant expression patterns with PTEN and copy number loss in cancers. Our study presents a road map for the prediction and validation of ceRNA activity and networks, and thus imparts a trans-regulatory function to protein-coding mRNAs.
We recently proposed that competitive endogenous RNAs (ceRNAs) sequester microRNAs to regulate mRNA transcripts containing common microRNA recognition elements (MREs). However, the functional role of ceRNAs in cancer remains unknown. Loss of PTEN, a tumor suppressor regulated by ceRNA activity, frequently occurs in melanoma. Here, we report the discovery of significant enrichment of putative PTEN ceRNAs among genes whose loss accelerates tumorigenesis following Sleeping Beauty insertional mutagenesis in a mouse model of melanoma. We validated several putative PTEN ceRNAs and further characterized one, the ZEB2 transcript. We show that ZEB2 modulates PTEN protein levels in a microRNA-dependent, protein coding-independent manner. Attenuation of ZEB2 expression activates the PI3K/AKT pathway, enhances cell transformation, and commonly occurs in human melanomas and other cancers expressing low PTEN levels. Our study genetically identifies multiple putative microRNA decoys for PTEN, validates ZEB2 mRNA as a bona fide PTEN ceRNA, and demonstrates that abrogated ZEB2 expression cooperates with BRAFV600E to promote melanomagenesis.
We sought exonic transcriptional regulatory elements by shotgun cloning human cDNA fragments into luciferase reporter vectors and measuring the resulting expression levels in liver cells. We uncovered seven regulatory elements within coding regions and three within 3' untranslated regions (UTRs). Two of the putative regulatory elements were enhancers and eight were silencers. The regulatory elements were generally but not consistently evolutionarily conserved and also showed a trend toward decreased population diversity. Furthermore, the exonic regulatory elements were enriched in known transcription factor binding sites (TFBSs) and were associated with several histone modifications and transcriptionally relevant chromatin. Evidence was obtained for bidirectional cis-regulation of a coding region element within a tubulin gene, TUBA1B, by the transcription factors PPARA and RORA. We estimate that hundreds of exonic transcriptional regulatory elements exist, an unexpected finding that highlights a surprising multi-functionality of sequences in the human genome.
Among thousands of long non-coding RNAs (lncRNAs) only a small subset is functionally characterized and the functional annotation of lncRNAs on the genomic scale remains inadequate. In this study we computationally characterized two functionally different parts of human lncRNAs transcriptome based on their ability to bind the polycomb repressive complex, PRC2. This classification is enabled by the fact that while all lncRNAs constitute a diverse set of sequences, the classes of PRC2-binding and PRC2 non-binding lncRNAs possess characteristic combinations of sequence-structure patterns and, therefore, can be separated within the feature space. Based on the specific combination of features, we built several machine-learning classifiers and identified the SVM-based classifier as the best performing. We further showed that the SVM-based classifier is able to generalize on the independent data sets. We observed that this classifier, trained on the human lncRNAs, can predict up to 59.4% of PRC2-binding lncRNAs in mice. This suggests that, despite the low degree of sequence conservation, many lncRNAs play functionally conserved biological roles.
Data normalization is a crucial preliminary step in analyzing genomic datasets. The goal of normalization is to remove global variation to make readings across different experiments comparable. In addition, most genomic loci have non-uniform sensitivity to any given assay because of variation in local sequence properties. In microarray experiments, this non-uniform sensitivity is due to different DNA hybridization and cross-hybridization efficiencies, known as the probe effect. In this paper we introduce a new scheme, called Group Normalization (GN), to remove both global and local biases in one integrated step, whereby we determine the normalized probe signal by finding a set of reference probes with similar responses. Compared to conventional normalization methods such as Quantile normalization and physically motivated probe effect models, our proposed method is general in the sense that it does not require the assumption that the underlying signal distribution be identical for the treatment and control, and is flexible enough to correct for nonlinear and higher order probe effects. The Group Normalization algorithm is computationally efficient and easy to implement. We also describe a variant of the Group Normalization algorithm, called Cross Normalization, which efficiently amplifies biologically relevant differences between any two genomic datasets.
MicroRNAs (miRNAs) have emerged as fundamental regulators that silence gene expression at the post-transcriptional and translational levels. The identification of their targets is a major challenge to elucidate the regulated biological processes. The overall effect of miRNA is reflected on target mRNA expression, suggesting the design of new investigative methods based on high-throughput experimental data such as miRNA and transcriptome profiles. We propose a novel statistical measure of non-linear dependence between miRNA and mRNA expression, in order to infer miRNA-target interactions. This approach, which we name antagonism pattern detection, is based on the statistical recognition of a triangular-shaped pattern in miRNA-target expression profiles. This pattern is observed in miRNA-target expression measurements since their simultaneously elevated expression is statistically under-represented in the case of miRNA silencing effect. The proposed method enables miRNA target prediction to strongly rely on cellular context and physiological conditions reflected by expression data. The procedure has been assessed on synthetic datasets and tested on a set of real positive controls. Then it has been applied to analyze expression data from Ewing’s sarcoma patients. The antagonism relationship is evaluated as a good indicator of real miRNA-target biological interaction. The predicted targets are consistently enriched for miRNA binding site motifs in their 3′UTR. Moreover, we reveal sets of predicted targets for each miRNA sharing important biological function. The procedure allows us to infer crucial miRNA regulators and their potential targets in Ewing’s sarcoma disease. It can be considered as a valid statistical approach to discover new insights in the miRNA regulatory mechanisms.
A universal cancer biomarker candidate for diagnosis is supposed to distinguish, within a broad range of tumors, between healthy and diseased patients. Recently published studies have explored the universal usefulness of some biomarkers in human tumors. In this study, we present an integrative approach to search for potential common cancer biomarkers. Using the TFactS web-tool with a catalogue of experimentally established gene regulations, we could predict transcription factors (TFs) regulated in 305 different human cancer cell lines covering a large panel of tumor types. We also identified chromosomal regions having significant copy number variation (CNV) in these cell lines. Within the scope of TFactS catalogue, 88 TFs whose activity status were explained by their gene expressions and CNVs were identified. Their minimal connected network (MCN) of protein-protein interactions forms a significant module within the human curated TF proteome. Functional analysis of the proteins included in this MCN revealed enrichment in cancer pathways as well as inflammation. The ten most central proteins in MCN are TFs that trans-regulate 157 known genes encoding secreted and transmembrane proteins. In publicly available collections of gene expression data from 8,525 patient tissues, 86 genes were differentially regulated in cancer compared to inflammatory diseases and controls. From TCGA cancer gene expression data sets, 50 genes were significantly associated to patient survival in at least one tumor type. Enrichment analysis shows that these genes mechanistically interact in common cancer pathways. Among these cancer biomarker candidates, TFRC, MET and VEGFA are commonly amplified genes in tumors and their encoded proteins stained positive in more than 80% of malignancies from public databases. They are linked to angiogenesis and hypoxia, which are common in cancer. They could be interesting for further investigations in cancer diagnostic strategies.
Expression levels of mRNAs are among other factors regulated by microRNAs. A particular microRNA can bind specifically to several target mRNAs and lead to their degradation. Expression levels of both, mRNAs and microRNAs, can be obtained by microarray experiments. In order to increase the power of detecting microRNAs that are differentially expressed between two different groups of samples, we incorporate expression levels of their related target gene sets. Group effects are determined individually for each microRNA, and by enrichment tests and global tests for target gene sets. The resulting lists of p-values from individual and set-wise testing are combined by means of meta analysis. We propose a new approach to connect microRNA-wise and gene set-wise information by means of p-value combination as often used in meta-analysis. In this context, we evaluate the usefulness of different approaches of gene set tests. In a simulation study we reveal that our combination approach is more powerful than microRNA-wise testing alone. Furthermore, we show that combining microRNA-wise results with ‘competitive’ gene set tests maintains a pre-specified false discovery rate. In contrast, a combination with ‘self-contained’ gene set tests can harm the false discovery rate, particularly when gene sets are not disjunct.
MicroRNA is a set of small RNA molecules mediating gene expression at post-transcriptional/translational levels. Most of well-established high throughput discovery platforms, such as microarray, real time quantitative PCR, and sequencing, have been adapted to study microRNA in various human diseases. The total number of microRNAs in humans is approximately 1,800, which challenges some analytical methodologies requiring a large number of entries. Unlike messenger RNA, the majority of microRNA (60%) maintains relatively low abundance in the cells. When analyzed using microarray, the signals of these low-expressed microRNAs are influenced by other non-specific signals including the background noise. It is crucial to distinguish the true microRNA signals from measurement errors in microRNA array data analysis. In this study, we propose a novel measurement error model-based normalization method and differentially-expressed microRNA detection method for microRNA profiling data acquired from locked nucleic acids (LNA) microRNA array. Compared with some existing methods, the proposed method significantly improves the detection among low-expressed microRNAs when assessed by quantitative real-time PCR assay.
Trabectedin, a new antitumor compound originally derived from a marine tunicate, is clinically effective in soft tissue sarcoma. The drug has shown a high selectivity for myxoid liposarcoma, characterized by the translocation t(12;16)(q13; p11) leading to the expression of FUS-CHOP fusion gene. Trabectedin appears to act interfering with mechanisms of transcription regulation. In particular, the transactivating activity of FUS-CHOP was found to be impaired by trabectedin treatment. Even after prolonged response resistance occurs and thus it is important to elucidate the mechanisms of resistance to trabectedin. To this end we developed and characterized a myxoid liposarcoma cell line resistant to trabectedin (402-91/ET), obtained by exposing the parental 402-91 cell line to stepwise increases in drug concentration. The aim of this study was to compare mRNAs, miRNAs and proteins profiles of 402-91 and 402-91/ET cells through a systems biology approach. We identified 3,083 genes, 47 miRNAs and 336 proteins differentially expressed between 402-91 and 402-91/ET cell lines. Interestingly three miRNAs among those differentially expressed, miR-130a, miR-21 and miR-7, harbored CHOP binding sites in their promoter region. We used computational approaches to integrate the three regulatory layers and to generate a molecular map describing the altered circuits in sensitive and resistant cell lines. By combining transcriptomic and proteomic data, we reconstructed two different networks, i.e. apoptosis and cell cycle regulation, that could play a key role in modulating trabectedin resistance. This approach highlights the central role of genes such as CCDN1, RB1, E2F4, TNF, CDKN1C and ABL1 in both pre- and post-transcriptional regulatory network. The validation of these results in in vivo models might be clinically relevant to stratify myxoid liposarcoma patients with different sensitivity to trabectedin treatment.
Various methods of reconstructing transcriptional regulatory networks infer transcriptional regulatory interactions (TRIs) between strongly coexpressed gene pairs (as determined from microarray experiments measuring mRNA levels). Alternatively, however, the coexpression of two genes might imply that they are coregulated by one or more transcription factors (TFs), and do not necessarily share a direct regulatory interaction. We explore whether and under what circumstances gene pairs with a high degree of coexpression are more likely to indicate TRIs, coregulation or both. Here we use established TRIs in combination with microarray expression data from both Escherichia coli (a prokaryote) and Saccharomyces cerevisiae (a eukaryote) to assess the accuracy of predictions of coregulated gene pairs and TRIs from coexpressed gene pairs. We find that coexpressed gene pairs are more likely to indicate coregulation than TRIs for Saccharomyces cerevisiae, but the incidence of TRIs in highly coexpressed gene pairs is higher for Escherichia coli. The data processing inequality (DPI) has previously been applied for the inference of TRIs. We consider the case where a transcription factor gene is known to regulate two genes (one of which is a transcription factor gene) that are known not to regulate one another. According to the DPI, the non-interacting gene pairs should have the smallest mutual information among all pairs in the triplets. While this is sometimes the case for Escherichia coli, we find that it is almost always not the case for Saccharomyces cerevisiae. This brings into question the usefulness of the DPI sometimes employed to infer TRIs from expression data. Finally, we observe that when a TF gene is known to regulate two other genes, it is rarely the case that one regulatory interaction is positively correlated and the other interaction is negatively correlated. Typically both are either positively or negatively correlated.
miRNAs are small RNA molecules (′ 22nt) that interact with their corresponding target mRNAs inhibiting the translation of the mRNA into proteins and cleaving the target mRNA. This second effect diminishes the overall expression of the target mRNA. Several miRNA-mRNA relationship databases have been deployed, most of them based on sequence complementarities. However, the number of false positives in these databases is large and they do not overlap completely. Recently, it has been proposed to combine expression measurement from both miRNA and mRNA and sequence based predictions to achieve more accurate relationships. In our work, we use LASSO regression with non-positive constraints to integrate both sources of information. LASSO enforces the sparseness of the solution and the non-positive constraints restrict the search of miRNA targets to those with down-regulation effects on the mRNA expression. We named this method TaLasso (miRNA-Target LASSO).
We used TaLasso on two public datasets that have paired expression levels of human miRNAs and mRNAs. The top ranked interactions recovered by TaLasso are especially enriched (more than using any other algorithm) in experimentally validated targets. The functions of the genes with mRNA transcripts in the top-ranked interactions are meaningful. This is not the case using other algorithms.
TaLasso is available as Matlab or R code. There is also a web-based tool for human miRNAs at http://talasso.cnb.csic.es/.
A major part of the post-transcriptional regulation of gene expression is affected by trans-acting elements, such as microRNAs, binding the 3′ untraslated region (UTR) of their target mRNAs. Proliferating cells partly escape this type of negative regulation by expressing shorter 3′ UTRs, depleted of microRNA binding sites, compared to non-proliferating cells. Using large-scale gene expression datasets, we show that a similar phenomenon takes place in breast and lung cancer: tumors expressing shorter 3′ UTRs tend to be more aggressive and to result in shorter patient survival. Moreover, we show that a gene expression signature based only on the expression ratio of alternative 3′ UTRs is a strong predictor of survival in both tumors. Genes undergoing 3′UTR shortening in aggressive tumors of the two tissues significantly overlap, and several of them are known to be involved in tumor progression. However the pattern of 3′ UTR shortening in aggressive tumors in vivo is clearly distinct from analogous patterns involved in proliferation and transformation.
The advent of next generation sequencing (NGS) technologies have revolutionised the way biologists produce, analyse and interpret data. Although NGS platforms provide a cost-effective way to discover genome-wide variants from a single experiment, variants discovered by NGS need follow up validation due to the high error rates associated with various sequencing chemistries. Recently, whole exome sequencing has been proposed as an affordable option compared to whole genome runs but it still requires follow up validation of all the novel exomic variants. Customarily, a consensus approach is used to overcome the systematic errors inherent to the sequencing technology, alignment and post alignment variant detection algorithms. However, the aforementioned approach warrants the use of multiple sequencing chemistry, multiple alignment tools, multiple variant callers which may not be viable in terms of time and money for individual investigators with limited informatics know-how. Biologists often lack the requisite training to deal with the huge amount of data produced by NGS runs and face difficulty in choosing from the list of freely available analytical tools for NGS data analysis. Hence, there is a need to customise the NGS data analysis pipeline to preferentially retain true variants by minimising the incidence of false positives and make the choice of right analytical tools easier. To this end, we have sampled different freely available tools used at the alignment and post alignment stage suggesting the use of the most suitable combination determined by a simple framework of pre-existing metrics to create significant datasets.
RegnANN is a novel method for reverse engineering gene networks based on an ensemble of multilayer perceptrons. The algorithm builds a regressor for each gene in the network, estimating its neighborhood independently. The overall network is obtained by joining all the neighborhoods. RegnANN makes no assumptions about the nature of the relationships between the variables, potentially capturing high-order and non linear dependencies between expression patterns. The evaluation focuses on synthetic data mimicking plausible submodules of larger networks and on biological data consisting of submodules of Escherichia coli. We consider Barabasi and Erdös-Rényi topologies together with two methods for data generation. We verify the effect of factors such as network size and amount of data to the accuracy of the inference algorithm. The accuracy scores obtained with RegnANN is methodically compared with the performance of three reference algorithms: ARACNE, CLR and KELLER. Our evaluation indicates that RegnANN compares favorably with the inference methods tested. The robustness of RegnANN, its ability to discover second order correlations and the agreement between results obtained with this new methods on both synthetic and biological data are promising and they stimulate its application to a wider range of problems.
Gene-on-gene regulations are key components of every living organism. Dynamical abstract models of genetic regulatory networks help explain the genome's evolvability and robustness. These properties can be attributed to the structural topology of the graph formed by genes, as vertices, and regulatory interactions, as edges. Moreover, the actual gene interaction of each gene is believed to play a key role in the stability of the structure. With advances in biology, some effort was deployed to develop update functions in Boolean models that include recent knowledge. We combine real-life gene interaction networks with novel update functions in a Boolean model. We use two sub-networks of biological organisms, the yeast cell-cycle and the mouse embryonic stem cell, as topological support for our system. On these structures, we substitute the original random update functions by a novel threshold-based dynamic function in which the promoting and repressing effect of each interaction is considered. We use a third real-life regulatory network, along with its inferred Boolean update functions to validate the proposed update function. Results of this validation hint to increased biological plausibility of the threshold-based function. To investigate the dynamical behavior of this new model, we visualized the phase transition between order and chaos into the critical regime using Derrida plots. We complement the qualitative nature of Derrida plots with an alternative measure, the criticality distance, that also allows to discriminate between regimes in a quantitative way. Simulation on both real-life genetic regulatory networks show that there exists a set of parameters that allows the systems to operate in the critical region. This new model includes experimentally derived biological information and recent discoveries, which makes it potentially useful to guide experimental research. The update function confers additional realism to the model, while reducing the complexity and solution space, thus making it easier to investigate.
Ewing's sarcoma family tumors (ESFT) are the second most common bone malignancy in children and young adults, characterized by unique chromosomal translocations that in 85% of cases lead to expression of the EWS-FLI-1 fusion protein. EWS-FLI-1 functions as an aberrant transcription factor that can both induce and suppress members of its target gene repertoire. We have recently demonstrated that EWS-FLI-1 can alter microRNA (miRNA) expression and that miRNA145 is a direct EWS-FLI-1 target whose suppression is implicated in ESFT development. Here, we use miRNA arrays to compare the global miRNA expression profile of human mesenchymal stem cells (MSC) and ESFT cell lines, and show that ESFT display a distinct miRNA signature that includes induction of the oncogenic miRNA 17–92 cluster and repression of the tumor suppressor let-7 family. We demonstrate that direct repression of let-7a by EWS-FLI-1 participates in the tumorigenic potential of ESFT cells in vivo. The mechanism whereby let-7a expression regulates ESFT growth is shown to be mediated by its target gene HMGA2, as let-7a overexpression and HMGA2 repression both block ESFT cell tumorigenicity. Consistent with these observations, systemic delivery of synthetic let-7a into ESFT-bearing mice restored its expression in tumor cells, decreased HMGA2 expression levels and resulted in ESFT growth inhibition in vivo. Our observations provide evidence that deregulation of let-7a target gene expression participates in ESFT development and identify let-7a as promising new therapeutic target for one of the most aggressive pediatric malignancies.