The identification of human disease-related microRNAs (disease miRNAs) is important for further investigating their involvement in the pathogenesis of diseases. More experimentally validated miRNA-disease associations have been accumulated recently. On the basis of these associations, it is essential to predict disease miRNAs for various human diseases. It is useful in providing reliable disease miRNA candidates for subsequent experimental studies.
It is known that miRNAs with similar functions are often associated with similar diseases and vice versa. Therefore, the functional similarity of two miRNAs has been successfully estimated by measuring the semantic similarity of their associated diseases. To effectively predict disease miRNAs, we calculated the functional similarity by incorporating the information content of disease terms and phenotype similarity between diseases. Furthermore, the members of miRNA family or cluster are assigned higher weight since they are more probably associated with similar diseases. A new prediction method, HDMP, based on weighted k most similar neighbors is presented for predicting disease miRNAs. Experiments validated that HDMP achieved significantly higher prediction performance than existing methods. In addition, the case studies examining prostatic neoplasms, breast neoplasms, and lung neoplasms, showed that HDMP can uncover potential disease miRNA candidates.
The superior performance of HDMP can be attributed to the accurate measurement of miRNA functional similarity, the weight assignment based on miRNA family or cluster, and the effective prediction based on weighted k most similar neighbors. The online prediction and analysis tool is freely available at http://nclab.hit.edu.cn/hdmpred.
Transforming growth factor β (TGF-β) signaling regulates cell growth and survival. Dysregulation of the TGF-β pathway is common in viral infection and cancer. Latent infection by Kaposi's sarcoma-associated herpesvirus (KSHV) is required for the development of several AIDS-related malignancies, including Kaposi's sarcoma and primary effusion lymphoma (PEL). KSHV encodes more than two dozen microRNAs (miRs) derived from 12 pre-miRs with largely unknown functions. In this study, we show that miR variants processed from pre-miR-K10 are expressed in KSHV-infected PEL cells and endothelial cells, while cellular miR-142-3p and its variant miR-142-3p_-1_5, which share the same seed sequence with miR-K10a_ +1_5, are expressed only in PEL cells and not in uninfected and KSHV-infected TIME cells. KSHV miR-K10 variants inhibit TGF-β signaling by targeting TGF-β type II receptor (TβRII). Computational and reporter mutagenesis analyses identified three functional target sites in the TβRII 3′ untranslated region (3′UTR). Expression of miR-K10 variants is sufficient to inhibit TGF-β-induced cell apoptosis. A suppressor of the miRs sensitizes latent KSHV-infected PEL cells to TGF-β and induces apoptosis. These results indicate that miR-K10 variants manipulate the TGF-β pathway to confer cells with resistance to the growth-inhibitory effect of TGF-β. Thus, KSHV miRs might target the tumor-suppressive TGF-β pathway to promote viral latency and contribute to malignant cellular transformation.
microRNAs (miRNAs) have been implicated in the control of many biological processes and their deregulation has been associated with many cancers. In recent years, the cancer stem cell (CSC) concept has been applied to many cancers including pediatric. We hypothesized that a common signature of deregulated miRNAs in the CSCs fraction may explain the disrupted signaling pathways in CSCs.
Using a high throughput qPCR approach we identified 26 CSC associated differentially expressed miRNAs (DEmiRs). Using BCmicrO algorithm 865 potential CSC associated DEmiR targets were obtained. These potential targets were subjected to KEGG, Biocarta and Gene Ontology pathway and biological processes analysis. Four annotated pathways were enriched: cell cycle, cell proliferation, p53 and TGF-beta/BMP. Knocking down hsa-miR-21-5p, hsa-miR-181c-5p and hsa-miR-135b-5p using antisense oligonucleotides and small interfering RNA in cell lines led to the depletion of the CSC fraction and impairment of sphere formation (CSC surrogate assays).
Our findings indicated that CSC associated DEmiRs and the putative pathways they regulate may have potential therapeutic applications in pediatric cancers.
Common microarray and next-generation sequencing data analysis concentrate on tumor subtype classification, marker detection, and transcriptional regulation discovery during biological processes by exploring the correlated gene expression patterns and their shared functions. Genetic regulatory network (GRN) based approaches have been employed in many large studies in order to scrutinize for dysregulation and potential treatment controls. In addition to gene regulation and network construction, the concept of the network modulator that has significant systemic impact has been proposed, and detection algorithms have been developed in past years. Here we provide a unified mathematic description of these methods, followed with a brief survey of these modulator identification algorithms. As an early attempt to extend the concept to new RNA regulation mechanism, competitive endogenous RNA (ceRNA), into a modulator framework, we provide two applications to illustrate the network construction, modulation effect, and the preliminary finding from these networks. Those methods we surveyed and developed are used to dissect the regulated network under different modulators. Not limit to these, the concept of “modulation” can adapt to various biological mechanisms to discover the novel gene regulation mechanisms.
The 2012 International Conference on Intelligent Biology and Medicine (ICIBM 2012) was held on April 22-24, 2012 in Nashville, Tennessee, USA. The conference featured six technical sessions, one tutorial session, one workshop, and 3 keynote presentations that covered state-of-the-art research activities in genomics, systems biology, and intelligent computing. In addition to a major emphasis on the next generation sequencing (NGS)-driven informatics, ICIBM 2012 aligned significant interests in systems biology and its applications in medicine. We highlight in this editorial the selected papers from the meeting that address the developments of novel algorithms and applications in systems biology.
MicroRNAs (miRNAs) are 19-25 nucleotides non-coding RNAs known to have important post-transcriptional regulatory functions. The computational target prediction algorithm is vital to effective experimental testing. However, since different existing algorithms rely on different features and classifiers, there is a poor agreement among the results of different algorithms. To benefit from the advantages of different algorithms, we proposed an algorithm called BCmicrO that combines the prediction of different algorithms with Bayesian Network. BCmicrO was evaluated using the training data and the proteomic data. The results show that BCmicrO improves both the sensitivity and the specificity of each individual algorithm. All the related materials including genome-wide prediction of human targets and a web-based tool are available at http://compgenomics.utsa.edu/gene/gene_1.php.
We present a report of the 2012 International Conference on Intelligent Biology and Medicine (ICIBM 2012) and the editorial report of the supplement to BMC Genomics that includes 22 research papers selected from ICIBM 2012, which was held on April 22-24, 2012 in Nashville, Tennessee, USA. The conference covered a variety of research areas, including bioinformatics, systems biology, and intelligent computing. It included six sessions, a tutorial - Introduction to Proteome Informatics, a workshop - Next Generation Sequencing, and a poster session. The selected papers in this Supplement issue represent the genomic focus in ICIBM 2012.
DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic.
A Dirichlet process beta mixture model (DPBMM) is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters.
The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM) brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value < 0.1), while hierarchical clustering cannot yield statistically significant clusters.
This paper considers the problem of automatic characterization and detection of target images in a rapid serial visual presentation (RSVP) task based on EEG data. A novel method that aims to identify single-trial event-related potentials (ERPs) in time-frequency is proposed, and a robust classifier with feature clustering is developed to better utilize the correlated ERP features. The method is applied to EEG recordings of a RSVP experiment with multiple sessions and subjects.
The results show that the target image events are mainly characterized by 3 distinct patterns in the time-frequency domain, i.e., a theta band (4.3 Hz) power boosting 300–700 ms after the target image onset, an alpha band (12 Hz) power boosting 500–1000 ms after the stimulus onset, and a delta band (2 Hz) power boosting after 500 ms. The most discriminant time-frequency features are power boosting and are relatively consistent among multiple sessions and subjects.
Since the original discriminant time-frequency features are highly correlated, we constructed the uncorrelated features using hierarchical clustering for better classification of target and non-target images. With feature clustering, performance (area under ROC) improved from 0.85 to 0.89 on within-session tests, and from 0.76 to 0.84 on cross-subject tests. The constructed uncorrelated features were more robust than the original discriminant features and corresponded to a number of local regions on the time-frequency plane.
Availability: The data and code are available at: http://compgenomics.cbi.utsa.edu/rsvp/index.html
Infections by viruses are associated with approximately 12% of human cancer. Kaposi’s sarcoma-associated herpesvirus (KSHV) is causally linked to several malignancies commonly found in AIDS patients. The mechanism of KSHV-induced oncogenesis remains elusive, due in part to the lack of an adequate experimental system for cellular transformation of primary cells. Here, we report efficient infection and cellular transformation of primary rat embryonic metanephric mesenchymal precursor cells (MM cells) by KSHV. Cellular transformation occurred at as early as day 4 after infection and in nearly all infected cells. Transformed cells expressed hallmark vascular endothelial, lymphatic endothelial, and mesenchymal markers and efficiently induced tumors in nude mice. KSHV established latent infection in MM cells, and lytic induction resulted in low levels of detectable infectious virions despite robust expression of lytic genes. Most KSHV-induced tumor cells were in a latent state, although a few showed heterogeneous expression of lytic genes. This efficient system for KSHV cellular transformation of primary cells might facilitate the study of growth deregulation mechanisms resulting from KSHV infections.
The availability of temporal measurements on biological experiments has significantly promoted research areas in systems biology. To gain insight into the interaction and regulation of biological systems, mathematical frameworks such as ordinary differential equations have been widely applied to model biological pathways and interpret the temporal data. Hill equations are the preferred formats to represent the reaction rate in differential equation frameworks, due to their simple structures and their capabilities for easy fitting to saturated experimental measurements. However, Hill equations are highly nonlinearly parameterized functions, and parameters in these functions cannot be measured easily. Additionally, because of its high nonlinearity, adaptive parameter estimation algorithms developed for linear parameterized differential equations cannot be applied. Therefore, parameter estimation in nonlinearly parameterized differential equation models for biological pathways is both challenging and rewarding. In this study, we propose a Bayesian parameter estimation algorithm to estimate parameters in nonlinear mathematical models for biological pathways using time series data.
We used the Runge-Kutta method to transform differential equations to difference equations assuming a known structure of the differential equations. This transformation allowed us to generate predictions dependent on previous states and to apply a Bayesian approach, namely, the Markov chain Monte Carlo (MCMC) method. We applied this approach to the biological pathways involved in the left ventricle (LV) response to myocardial infarction (MI) and verified our algorithm by estimating two parameters in a Hill equation embedded in the nonlinear model. We further evaluated our estimation performance with different parameter settings and signal to noise ratios. Our results demonstrated the effectiveness of the algorithm for both linearly and nonlinearly parameterized dynamic systems.
Our proposed Bayesian algorithm successfully estimated parameters in nonlinear mathematical models for biological pathways. This method can be further extended to high order systems and thus provides a useful tool to analyze biological dynamics and extract information using temporal data.
MicroRNAs (miRNAs) are a set of short (19∼24 nt) non-coding RNAs that play significant roles as posttranscriptional regulators in animals and plants. The ab initio prediction methods show excellent performance for discovering new pre-miRNAs. While most of these methods can distinguish real pre-miRNAs from pseudo pre-miRNAs, few can predict the positions of miRNAs. Among the existing methods that can also predict the miRNA positions, most of them are designed for mammalian miRNAs, including human and mouse. Minority of methods can predict the positions of plant miRNAs. Accurate prediction of the miRNA positions remains a challenge, especially for plant miRNAs. This motivates us to develop MaturePred, a machine learning method based on support vector machine, to predict the positions of plant miRNAs for the new plant pre-miRNA candidates.
A miRNA:miRNA* duplex is regarded as a whole to capture the binding characteristics of miRNAs. We extract the position-specific features, the energy related features, the structure related features, and stability related features from real/pseudo miRNA:miRNA* duplexes. A set of informative features are selected to improve the prediction accuracy. Two-stage sample selection algorithm is proposed to combat the serious imbalance problem between real and pseudo miRNA:miRNA* duplexes. The prediction method, MaturePred, can accurately predict plant miRNAs and achieve higher prediction accuracy compared with the existing methods. Further, we trained a prediction model with animal data to predict animal miRNAs. The model also achieves higher prediction performance. It further confirms the efficiency of our miRNA prediction method.
The superior performance of the proposed prediction model can be attributed to the extracted features of plant miRNAs and miRNA*s, the selected training dataset, and the carefully selected features. The web service of MaturePred, the training datasets, the testing datasets, and the selected features are freely available at http://nclab.hit.edu.cn/maturepred/.
Transcriptional regulation by transcription factor (TF) controls the time and abundance of mRNA transcription. Due to the limitation of current proteomics technologies, large scale measurements of protein level activities of TFs is usually infeasible, making computational reconstruction of transcriptional regulatory network a difficult task.
We proposed here a novel Bayesian non-negative factor model for TF mediated regulatory networks. Particularly, the non-negative TF activities and sample clustering effect are modeled as the factors from a Dirichlet process mixture of rectified Gaussian distributions, and the sparse regulatory coefficients are modeled as the loadings from a sparse distribution that constrains its sparsity using knowledge from database; meantime, a Gibbs sampling solution was developed to infer the underlying network structure and the unknown TF activities simultaneously. The developed approach has been applied to simulated system and breast cancer gene expression data. Result shows that, the proposed method was able to systematically uncover TF mediated transcriptional regulatory network structure, the regulatory coefficients, the TF protein level activities and the sample clustering effect. The regulation target prediction result is highly coordinated with the prior knowledge, and sample clustering result shows superior performance over previous molecular based clustering method.
The results demonstrated the validity and effectiveness of the proposed approach in reconstructing transcriptional networks mediated by TFs through simulated systems and real data.
An algorithm for the discovery of time varying modules using genome-wide expression data is present here. When applied to large-scale time serious data, our method is designed to discover not only the transcription modules but also their timing information, which is rarely annotated by the existing approaches. Rather than assuming commonly defined time constant transcription modules, a module is depicted as a set of genes that are co-regulated during a specific period of time, i.e., a time dependent transcription module (TDTM). A rigorous mathematical definition of TDTM is provided, which is serve as an objective function for retrieving modules. Based on the definition, an effective signature algorithm is proposed that iteratively searches the transcription modules from the time series data. The proposed method was tested on the simulated systems and applied to the human time series microarray data during Kaposi's sarcoma-associated herpesvirus (KSHV) infection. The result has been verified by Expression Analysis Systematic Explorer.
In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000–15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert’s visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition.
Bayesian methods; Markov chain Monte Carlo; mass spectrometry; peptide peak detection; time-of-flight
MicroRNAs (miRNAs) are single-stranded non-coding RNAs shown to plays important regulatory roles in a wide range of biological processes and diseases. The functions and regulatory mechanisms of most of miRNAs are still poorly understood in part because of the difficulty in identifying the miRNA regulatory targets. To this end, computational methods have evolved as important tools for genome-wide target screening. Although considerable work in the past few years has produced many target prediction algorithms, most of them are solely based on sequence, and their accuracy is still poor. In contrast, gene expression profiling from miRNA over-expression experiments can provide additional information about miRNA targets. This paper presents a Bayesian approach to integrate sequence level prediction result with expression profiling to improve the performance of miRNA target identification. The test on proteomic and IP pull-down data demonstrated better performance of the proposed approach.
miRNA; Target prediction; Bayesian methods; Gaussian mixtrue model
Human herpesviruses have latency and lytic replication phases in their lifecycle. Proper regulation of herpesviral lifecycle is essential for the evasion of host immune surveillance and development of their related diseases. Recent advancements indicate a role of a novel class of viral non-coding RNAs, microRNA (miRNA), in the fine-tuning of herpesviral lifecycle. So far, herpesviral miRNAs appear to promote viral latency by inhibiting viral lytic replication either through direct targeting of key viral replication genes or through manipulation of host pathways that regulate viral lifecycle. The oncogenic Kaposi sarcoma- associated herpesvirus (KSHV) has adapted both strategies to control viral latency. Our recent study has identified a KSHV miRNA that inhibits viral lytic replication by upregulating the NFκB pathway.
microRNA (miRNA); herpesviruses; kaposi’s sarcoma-associated herpesvirus (KSHV); latency and replication; NFκB
Response of cells to changing endogenous or exogenous conditions is governed by intricate molecular interactions, or regulatory networks. To lead to appropriate responses, regulatory network should be 1) context-specific, i.e., its constituents and topology depend on the phonotypical and experimental context including tissue types and cell conditions, such as damage, stress, macroenvironments of cell, etc. and 2) time varying, i.e., network elements and their regulatory roles change actively over time to control the endogenous cell states e.g. different stages in a cell cycle.
A novel network model PathRNet and a reconstruction approach PATTERN are proposed for reconstructing the context specific time varying regulatory networks by integrating microarray gene expression profiles and existing knowledge of pathways and transcription factors. The nodes of the PathRNet are Transcription Factors (TFs) and pathways, and edges represent the regulation between pathways and TFs. The reconstructed PathRNet for Kaposi's sarcoma-associated herpesvirus infection of human endothelial cells reveals the complicated dynamics of the underlying regulatory mechanisms that govern this intricate process. All the related materials including source code are available at http://compgenomics.utsa.edu/tvnet.html.
The proposed PathRNet provides a system level landscape of the dynamics of gene regulatory circuitry. The inference approach PATTERN enables robust reconstruction of the temporal dynamics of pathway-centric regulatory networks. The proposed approach for the first time provides a dynamic perspective of pathway, TF regulations, and their interaction related to specific endogenous and exogenous conditions.
MicroRNAs (miRNAs) are single-stranded non-coding RNAs shown to plays important regulatory roles in a wide range of biological processes and diseases. The functions and regulatory mechanisms of most of miRNAs are still poorly understood in part because of the difficulty in identifying the miRNA regulatory targets. To this end, computational methods have evolved as important tools for genome-wide target screening. Although considerable work in the past few years has produced many target prediction algorithms, most of them are solely based on sequence, and the accuracy is still poor. In contrast, gene expression profiling from miRNA transfection experiments can provide additional information about miRNA targets. However, most of existing research assumes down-regulated mRNAs as targets. Given the fact that the primary function of miRNA is protein inhibition, this assumption is neither sufficient nor necessary.
A novel Bayesian approach is proposed in this paper that integrates sequence level prediction with expression profiling of miRNA transfection. This approach does not restrict the target to be down-expressed and thus improve the performance of existing target prediction algorithm. The proposed algorithm was tested on simulated data, proteomics data, and IP pull-down data and shown to achieve better performance than existing approaches for target prediction. All the related materials including source code are available at http://compgenomics.utsa.edu/expmicro.html.
The proposed Bayesian algorithm integrates properly the sequence paring data and mRNA expression profiles for miRNA target prediction. This algorithm is shown to have better prediction performance than existing algorithms.
MicroRNAs (miRNAs) are single-stranded non-coding RNAs known to regulate a wide range of cellular processes by silencing the gene expression at the protein and/or mRNA levels. Computational prediction of miRNA targets is essential for elucidating the detailed functions of miRNA. However, the prediction specificity and sensitivity of the existing algorithms are still poor to generate meaningful, workable hypotheses for subsequent experimental testing. Constructing a richer and more reliable training data set and developing an algorithm that properly exploits this data set would be the key to improve the performance current prediction algorithms.
A comprehensive training data set is constructed for mammalian miRNAs with its positive targets obtained from the most up-to-date miRNA target depository called miRecords and its negative targets derived from 20 microarray data. A new algorithm SVMicrO is developed, which assumes a 2-stage structure including a site support vector machine (SVM) followed by a UTR-SVM. SVMicrO makes prediction based on 21 optimal site features and 18 optimal UTR features, selected by training from a comprehensive collection of 113 site and 30 UTR features. Comprehensive evaluation of SVMicrO performance has been carried out on the training data, proteomics data, and immunoprecipitation (IP) pull-down data. Comparisons with some popular algorithms demonstrate consistent improvements in prediction specificity, sensitivity and precision in all tested cases. All the related materials including source code and genome-wide prediction of human targets are available at http://compgenomics.utsa.edu/svmicro.html.
A 2-stage SVM based new miRNA target prediction algorithm called SVMicrO is developed. SVMicrO is shown to be able to achieve robust performance. It holds the promise to achieve continuing improvement whenever better training data that contain additional verified or high confidence positive targets and properly selected negative targets are available.