The 2012 International Conference on Intelligent Biology and Medicine (ICIBM 2012) was held on April 22-24, 2012 in Nashville, Tennessee, USA. The conference featured six technical sessions, one tutorial session, one workshop, and 3 keynote presentations that covered state-of-the-art research activities in genomics, systems biology, and intelligent computing. In addition to a major emphasis on the next generation sequencing (NGS)-driven informatics, ICIBM 2012 aligned significant interests in systems biology and its applications in medicine. We highlight in this editorial the selected papers from the meeting that address the developments of novel algorithms and applications in systems biology.
MicroRNAs (miRNAs) are 19-25 nucleotides non-coding RNAs known to have important post-transcriptional regulatory functions. The computational target prediction algorithm is vital to effective experimental testing. However, since different existing algorithms rely on different features and classifiers, there is a poor agreement among the results of different algorithms. To benefit from the advantages of different algorithms, we proposed an algorithm called BCmicrO that combines the prediction of different algorithms with Bayesian Network. BCmicrO was evaluated using the training data and the proteomic data. The results show that BCmicrO improves both the sensitivity and the specificity of each individual algorithm. All the related materials including genome-wide prediction of human targets and a web-based tool are available at http://compgenomics.utsa.edu/gene/gene_1.php.
We present a report of the 2012 International Conference on Intelligent Biology and Medicine (ICIBM 2012) and the editorial report of the supplement to BMC Genomics that includes 22 research papers selected from ICIBM 2012, which was held on April 22-24, 2012 in Nashville, Tennessee, USA. The conference covered a variety of research areas, including bioinformatics, systems biology, and intelligent computing. It included six sessions, a tutorial - Introduction to Proteome Informatics, a workshop - Next Generation Sequencing, and a poster session. The selected papers in this Supplement issue represent the genomic focus in ICIBM 2012.
DNA methylation occurs in the context of a CpG dinucleotide. It is an important epigenetic modification, which can be inherited through cell division. The two major types of methylation include hypomethylation and hypermethylation. Unique methylation patterns have been shown to exist in diseases including various types of cancer. DNA methylation analysis promises to become a powerful tool in cancer diagnosis, treatment and prognostication. Large-scale methylation arrays are now available for studying methylation genome-wide. The Illumina methylation platform simultaneously measures cytosine methylation at more than 1500 CpG sites associated with over 800 cancer-related genes. Cluster analysis is often used to identify DNA methylation subgroups for prognosis and diagnosis. However, due to the unique non-Gaussian characteristics, traditional clustering methods may not be appropriate for DNA and methylation data, and the determination of optimal cluster number is still problematic.
A Dirichlet process beta mixture model (DPBMM) is proposed that models the DNA methylation expressions as an infinite number of beta mixture distribution. The model allows automatic learning of the relevant parameters such as the cluster mixing proportion, the parameters of beta distribution for each cluster, and especially the number of potential clusters. Since the model is high dimensional and analytically intractable, we proposed a Gibbs sampling "no-gaps" solution for computing the posterior distributions, hence the estimates of the parameters.
The proposed algorithm was tested on simulated data as well as methylation data from 55 Glioblastoma multiform (GBM) brain tissue samples. To reduce the computational burden due to the high data dimensionality, a dimension reduction method is adopted. The two GBM clusters yielded by DPBMM are based on data of different number of loci (P-value < 0.1), while hierarchical clustering cannot yield statistically significant clusters.
This paper considers the problem of automatic characterization and detection of target images in a rapid serial visual presentation (RSVP) task based on EEG data. A novel method that aims to identify single-trial event-related potentials (ERPs) in time-frequency is proposed, and a robust classifier with feature clustering is developed to better utilize the correlated ERP features. The method is applied to EEG recordings of a RSVP experiment with multiple sessions and subjects.
The results show that the target image events are mainly characterized by 3 distinct patterns in the time-frequency domain, i.e., a theta band (4.3 Hz) power boosting 300–700 ms after the target image onset, an alpha band (12 Hz) power boosting 500–1000 ms after the stimulus onset, and a delta band (2 Hz) power boosting after 500 ms. The most discriminant time-frequency features are power boosting and are relatively consistent among multiple sessions and subjects.
Since the original discriminant time-frequency features are highly correlated, we constructed the uncorrelated features using hierarchical clustering for better classification of target and non-target images. With feature clustering, performance (area under ROC) improved from 0.85 to 0.89 on within-session tests, and from 0.76 to 0.84 on cross-subject tests. The constructed uncorrelated features were more robust than the original discriminant features and corresponded to a number of local regions on the time-frequency plane.
Availability: The data and code are available at: http://compgenomics.cbi.utsa.edu/rsvp/index.html
Infections by viruses are associated with approximately 12% of human cancer. Kaposi’s sarcoma-associated herpesvirus (KSHV) is causally linked to several malignancies commonly found in AIDS patients. The mechanism of KSHV-induced oncogenesis remains elusive, due in part to the lack of an adequate experimental system for cellular transformation of primary cells. Here, we report efficient infection and cellular transformation of primary rat embryonic metanephric mesenchymal precursor cells (MM cells) by KSHV. Cellular transformation occurred at as early as day 4 after infection and in nearly all infected cells. Transformed cells expressed hallmark vascular endothelial, lymphatic endothelial, and mesenchymal markers and efficiently induced tumors in nude mice. KSHV established latent infection in MM cells, and lytic induction resulted in low levels of detectable infectious virions despite robust expression of lytic genes. Most KSHV-induced tumor cells were in a latent state, although a few showed heterogeneous expression of lytic genes. This efficient system for KSHV cellular transformation of primary cells might facilitate the study of growth deregulation mechanisms resulting from KSHV infections.
The availability of temporal measurements on biological experiments has significantly promoted research areas in systems biology. To gain insight into the interaction and regulation of biological systems, mathematical frameworks such as ordinary differential equations have been widely applied to model biological pathways and interpret the temporal data. Hill equations are the preferred formats to represent the reaction rate in differential equation frameworks, due to their simple structures and their capabilities for easy fitting to saturated experimental measurements. However, Hill equations are highly nonlinearly parameterized functions, and parameters in these functions cannot be measured easily. Additionally, because of its high nonlinearity, adaptive parameter estimation algorithms developed for linear parameterized differential equations cannot be applied. Therefore, parameter estimation in nonlinearly parameterized differential equation models for biological pathways is both challenging and rewarding. In this study, we propose a Bayesian parameter estimation algorithm to estimate parameters in nonlinear mathematical models for biological pathways using time series data.
We used the Runge-Kutta method to transform differential equations to difference equations assuming a known structure of the differential equations. This transformation allowed us to generate predictions dependent on previous states and to apply a Bayesian approach, namely, the Markov chain Monte Carlo (MCMC) method. We applied this approach to the biological pathways involved in the left ventricle (LV) response to myocardial infarction (MI) and verified our algorithm by estimating two parameters in a Hill equation embedded in the nonlinear model. We further evaluated our estimation performance with different parameter settings and signal to noise ratios. Our results demonstrated the effectiveness of the algorithm for both linearly and nonlinearly parameterized dynamic systems.
Our proposed Bayesian algorithm successfully estimated parameters in nonlinear mathematical models for biological pathways. This method can be further extended to high order systems and thus provides a useful tool to analyze biological dynamics and extract information using temporal data.
MicroRNAs (miRNAs) are a set of short (19∼24 nt) non-coding RNAs that play significant roles as posttranscriptional regulators in animals and plants. The ab initio prediction methods show excellent performance for discovering new pre-miRNAs. While most of these methods can distinguish real pre-miRNAs from pseudo pre-miRNAs, few can predict the positions of miRNAs. Among the existing methods that can also predict the miRNA positions, most of them are designed for mammalian miRNAs, including human and mouse. Minority of methods can predict the positions of plant miRNAs. Accurate prediction of the miRNA positions remains a challenge, especially for plant miRNAs. This motivates us to develop MaturePred, a machine learning method based on support vector machine, to predict the positions of plant miRNAs for the new plant pre-miRNA candidates.
A miRNA:miRNA* duplex is regarded as a whole to capture the binding characteristics of miRNAs. We extract the position-specific features, the energy related features, the structure related features, and stability related features from real/pseudo miRNA:miRNA* duplexes. A set of informative features are selected to improve the prediction accuracy. Two-stage sample selection algorithm is proposed to combat the serious imbalance problem between real and pseudo miRNA:miRNA* duplexes. The prediction method, MaturePred, can accurately predict plant miRNAs and achieve higher prediction accuracy compared with the existing methods. Further, we trained a prediction model with animal data to predict animal miRNAs. The model also achieves higher prediction performance. It further confirms the efficiency of our miRNA prediction method.
The superior performance of the proposed prediction model can be attributed to the extracted features of plant miRNAs and miRNA*s, the selected training dataset, and the carefully selected features. The web service of MaturePred, the training datasets, the testing datasets, and the selected features are freely available at http://nclab.hit.edu.cn/maturepred/.
Transcriptional regulation by transcription factor (TF) controls the time and abundance of mRNA transcription. Due to the limitation of current proteomics technologies, large scale measurements of protein level activities of TFs is usually infeasible, making computational reconstruction of transcriptional regulatory network a difficult task.
We proposed here a novel Bayesian non-negative factor model for TF mediated regulatory networks. Particularly, the non-negative TF activities and sample clustering effect are modeled as the factors from a Dirichlet process mixture of rectified Gaussian distributions, and the sparse regulatory coefficients are modeled as the loadings from a sparse distribution that constrains its sparsity using knowledge from database; meantime, a Gibbs sampling solution was developed to infer the underlying network structure and the unknown TF activities simultaneously. The developed approach has been applied to simulated system and breast cancer gene expression data. Result shows that, the proposed method was able to systematically uncover TF mediated transcriptional regulatory network structure, the regulatory coefficients, the TF protein level activities and the sample clustering effect. The regulation target prediction result is highly coordinated with the prior knowledge, and sample clustering result shows superior performance over previous molecular based clustering method.
The results demonstrated the validity and effectiveness of the proposed approach in reconstructing transcriptional networks mediated by TFs through simulated systems and real data.
An algorithm for the discovery of time varying modules using genome-wide expression data is present here. When applied to large-scale time serious data, our method is designed to discover not only the transcription modules but also their timing information, which is rarely annotated by the existing approaches. Rather than assuming commonly defined time constant transcription modules, a module is depicted as a set of genes that are co-regulated during a specific period of time, i.e., a time dependent transcription module (TDTM). A rigorous mathematical definition of TDTM is provided, which is serve as an objective function for retrieving modules. Based on the definition, an effective signature algorithm is proposed that iteratively searches the transcription modules from the time series data. The proposed method was tested on the simulated systems and applied to the human time series microarray data during Kaposi's sarcoma-associated herpesvirus (KSHV) infection. The result has been verified by Expression Analysis Systematic Explorer.
In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000–15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert’s visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition.
Bayesian methods; Markov chain Monte Carlo; mass spectrometry; peptide peak detection; time-of-flight
MicroRNAs (miRNAs) are single-stranded non-coding RNAs shown to plays important regulatory roles in a wide range of biological processes and diseases. The functions and regulatory mechanisms of most of miRNAs are still poorly understood in part because of the difficulty in identifying the miRNA regulatory targets. To this end, computational methods have evolved as important tools for genome-wide target screening. Although considerable work in the past few years has produced many target prediction algorithms, most of them are solely based on sequence, and their accuracy is still poor. In contrast, gene expression profiling from miRNA over-expression experiments can provide additional information about miRNA targets. This paper presents a Bayesian approach to integrate sequence level prediction result with expression profiling to improve the performance of miRNA target identification. The test on proteomic and IP pull-down data demonstrated better performance of the proposed approach.
miRNA; Target prediction; Bayesian methods; Gaussian mixtrue model
Human herpesviruses have latency and lytic replication phases in their lifecycle. Proper regulation of herpesviral lifecycle is essential for the evasion of host immune surveillance and development of their related diseases. Recent advancements indicate a role of a novel class of viral non-coding RNAs, microRNA (miRNA), in the fine-tuning of herpesviral lifecycle. So far, herpesviral miRNAs appear to promote viral latency by inhibiting viral lytic replication either through direct targeting of key viral replication genes or through manipulation of host pathways that regulate viral lifecycle. The oncogenic Kaposi sarcoma- associated herpesvirus (KSHV) has adapted both strategies to control viral latency. Our recent study has identified a KSHV miRNA that inhibits viral lytic replication by upregulating the NFκB pathway.
microRNA (miRNA); herpesviruses; kaposi’s sarcoma-associated herpesvirus (KSHV); latency and replication; NFκB
Response of cells to changing endogenous or exogenous conditions is governed by intricate molecular interactions, or regulatory networks. To lead to appropriate responses, regulatory network should be 1) context-specific, i.e., its constituents and topology depend on the phonotypical and experimental context including tissue types and cell conditions, such as damage, stress, macroenvironments of cell, etc. and 2) time varying, i.e., network elements and their regulatory roles change actively over time to control the endogenous cell states e.g. different stages in a cell cycle.
A novel network model PathRNet and a reconstruction approach PATTERN are proposed for reconstructing the context specific time varying regulatory networks by integrating microarray gene expression profiles and existing knowledge of pathways and transcription factors. The nodes of the PathRNet are Transcription Factors (TFs) and pathways, and edges represent the regulation between pathways and TFs. The reconstructed PathRNet for Kaposi's sarcoma-associated herpesvirus infection of human endothelial cells reveals the complicated dynamics of the underlying regulatory mechanisms that govern this intricate process. All the related materials including source code are available at http://compgenomics.utsa.edu/tvnet.html.
The proposed PathRNet provides a system level landscape of the dynamics of gene regulatory circuitry. The inference approach PATTERN enables robust reconstruction of the temporal dynamics of pathway-centric regulatory networks. The proposed approach for the first time provides a dynamic perspective of pathway, TF regulations, and their interaction related to specific endogenous and exogenous conditions.
MicroRNAs (miRNAs) are single-stranded non-coding RNAs shown to plays important regulatory roles in a wide range of biological processes and diseases. The functions and regulatory mechanisms of most of miRNAs are still poorly understood in part because of the difficulty in identifying the miRNA regulatory targets. To this end, computational methods have evolved as important tools for genome-wide target screening. Although considerable work in the past few years has produced many target prediction algorithms, most of them are solely based on sequence, and the accuracy is still poor. In contrast, gene expression profiling from miRNA transfection experiments can provide additional information about miRNA targets. However, most of existing research assumes down-regulated mRNAs as targets. Given the fact that the primary function of miRNA is protein inhibition, this assumption is neither sufficient nor necessary.
A novel Bayesian approach is proposed in this paper that integrates sequence level prediction with expression profiling of miRNA transfection. This approach does not restrict the target to be down-expressed and thus improve the performance of existing target prediction algorithm. The proposed algorithm was tested on simulated data, proteomics data, and IP pull-down data and shown to achieve better performance than existing approaches for target prediction. All the related materials including source code are available at http://compgenomics.utsa.edu/expmicro.html.
The proposed Bayesian algorithm integrates properly the sequence paring data and mRNA expression profiles for miRNA target prediction. This algorithm is shown to have better prediction performance than existing algorithms.
MicroRNAs (miRNAs) are single-stranded non-coding RNAs known to regulate a wide range of cellular processes by silencing the gene expression at the protein and/or mRNA levels. Computational prediction of miRNA targets is essential for elucidating the detailed functions of miRNA. However, the prediction specificity and sensitivity of the existing algorithms are still poor to generate meaningful, workable hypotheses for subsequent experimental testing. Constructing a richer and more reliable training data set and developing an algorithm that properly exploits this data set would be the key to improve the performance current prediction algorithms.
A comprehensive training data set is constructed for mammalian miRNAs with its positive targets obtained from the most up-to-date miRNA target depository called miRecords and its negative targets derived from 20 microarray data. A new algorithm SVMicrO is developed, which assumes a 2-stage structure including a site support vector machine (SVM) followed by a UTR-SVM. SVMicrO makes prediction based on 21 optimal site features and 18 optimal UTR features, selected by training from a comprehensive collection of 113 site and 30 UTR features. Comprehensive evaluation of SVMicrO performance has been carried out on the training data, proteomics data, and immunoprecipitation (IP) pull-down data. Comparisons with some popular algorithms demonstrate consistent improvements in prediction specificity, sensitivity and precision in all tested cases. All the related materials including source code and genome-wide prediction of human targets are available at http://compgenomics.utsa.edu/svmicro.html.
A 2-stage SVM based new miRNA target prediction algorithm called SVMicrO is developed. SVMicrO is shown to be able to achieve robust performance. It holds the promise to achieve continuing improvement whenever better training data that contain additional verified or high confidence positive targets and properly selected negative targets are available.
Human herpesviruses have latency and lytic replication phases in their lifecycle. Proper regulation of herpesviral lifecycle is essential for the evasion of host immune surveillance and development of their related diseases. Recent advancements indicate a role of a novel class of viral non-coding RNAs, microRNA (miRNA), in the fine-tuning of herpesviral lifecycle. So far, herpesviral miRNAs appear to promote viral latency by inhibiting viral lytic replication either through direct targeting of key viral replication genes or through manipulation of host pathways that regulate viral lifecycle. The oncogenic Kaposi sarcomaassociated herpesvirus (KSHV) has adapted both strategies to control viral latency. Our recent study has identified a KSHV miRNA that inhibits viral lytic replication by upregulating the NFκB pathway.
microRNA (miRNA); herpesviruses; kaposi's sarcoma-associated herpesvirus (KSHV); latency and replication; NFκB
Kaposi’s sarcoma-associated herpesvirus (KSHV) is causally linked to several acquired immune deficiency syndrome related malignancies including Kaposi’s sarcoma (KS), primary effusion lymphoma (PEL), and a subset of multicentric Castleman’s disease1. Control of viral lytic replication is essential for KSHV latency, evasion of host immune system, and induction of tumors1. Here, we show that deletion of a cluster of 14 microRNAs (miRs) from KSHV genome significantly enhances viral lytic replication as a result of reduced NF-κB activity. The miR cluster regulates NF-κB pathway by reducing the expression of IκBα protein, an inhibitor of the NF-κB complexes. Computational and miR seed mutagenesis analyses identify KSHV miR-K1 that directly mediates IκBα ?protein level by targeting the 3’UTR of its transcript. Expression of miR-K1 is sufficient to rescue the NF-κB activity and inhibit viral lytic replication while inhibition of miR-K1 in KSHV-infected PEL cells has the opposite effects. Thus, KSHV encodes a miR to control viral replication by activating NF-κB pathway. These results illustrate an important role for KSHV miRs in regulating viral latency and lytic replication by manipulating a host survival pathway.
KSHV; microRNA; viral latency and replication; NF-κB; reverse genetics
Motivation: Clustering is a popular data exploration technique widely used in microarray data analysis. When dealing with time-series data, most conventional clustering algorithms, however, either use one-way clustering methods, which fail to consider the heterogeneity of temporary domain, or use two-way clustering methods that do not take into account the time dependency between samples, thus producing less informative results. Furthermore, enrichment analysis is often performed independent of and after clustering and such practice, though capable of revealing biological significant clusters, cannot guide the clustering to produce biologically significant result.
Result:We present a new enrichment constrained framework (ECF) coupled with a time-dependent iterative signature algorithm (TDISA), which, by applying a sliding time window to incorporate the time dependency of samples and imposing an enrichment constraint to parameters of clustering, allows supervised identification of temporal transcription modules (TTMs) that are biologically meaningful. Rigorous mathematical definitions of TTM as well as the enrichment constraint framework are also provided that serve as objective functions for retrieving biologically significant modules. We applied the enrichment constrained time-dependent iterative signature algorithm (ECTDISA) to human gene expression time-series data of Kaposi's sarcoma-associated herpesvirus (KSHV) infection of human primary endothelial cells; the result not only confirms known biological facts, but also reveals new insight into the molecular mechanism of KSHV infection.
Availability: Data and Matlab code are available at http://engineering.utsa.edu/∼yfhuang/ECTDISA.html
Supplementary information: Supplementary data are available at Bioinformatics online.
MicroRNAs (miRNAs) are 19 to 25 nucleotides non-coding RNAs known to possess important post-transcriptional regulatory functions. Identifying targeting genes that miRNAs regulate are important for understanding their specific biological functions. Usually, miRNAs down-regulate target genes through binding to the complementary sites in the 3' untranslated region (UTR) of the targets. In part, due to the large number of miRNAs and potential targets, an experimental based prediction design would be extremely laborious and economically unfavorable. However, since the bindings of the animal miRNAs are not a perfect one-to-one match with the complementary sites of their targets, it is difficult to predict targets of animal miRNAs by accessing their alignment to the 3' UTRs of potential targets. Consequently, sophisticated computational approaches for miRNA target prediction are being considered as essential methods in miRNA research.
We surveyed most of the current computational miRNA target prediction algorithms in this paper. Particularly, we provided a mathematical definition and formulated the problem of target prediction under the framework of statistical classification. Moreover, we summarized the features of miRNA-target pairs in target prediction approaches and discussed these approaches according to two categories, which are the rule-based and the data-driven approaches. The rule-based approach derives the classifier mainly on biological prior knowledge and important observations from biological experiments, whereas the data driven approach builds statistic models using the training data and makes predictions based on the models. Finally, we tested a few different algorithms on a set of experimentally validated true miRNA-target pairs  and a set of false miRNA-target pairs, derived from miRNA overexpression experiment . Receiver Operating Characteristic (ROC) curves were drawn to show the performances of these algorithms.
In this review, we will discuss peak detection in Liquid-Chromatography-Mass Spectrometry (LC/MS) from a signal processing perspective. A brief introduction to LC/MS is followed by a description of the major processing steps in LC/MS. Specifically, the problem of peak detection is formulated and various peak detection algorithms are described and compared.