Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)
more »
Year of Publication
Document Types
1.  From genes to networks: in systematic points of view 
BMC Systems Biology  2011;5(Suppl 3):I1.
We present a report of the BIOCOMP'10 - The 2010 International Conference on Bioinformatics & Computational Biology and other related work in the area of systems biology.
PMCID: PMC3287563  PMID: 22784614
2.  Statistical methods on detecting differentially expressed genes for RNA-seq data 
BMC Systems Biology  2011;5(Suppl 3):S1.
For RNA-seq data, the aggregated counts of the short reads from the same gene is used to approximate the gene expression level. The count data can be modelled as samples from Poisson distributions with possible different parameters. To detect differentially expressed genes under two situations, statistical methods for detecting the difference of two Poisson means are used. When the expression level of a gene is low, i.e., the number of count is small, it is usually more difficult to detect the mean differences, and therefore statistical methods which are more powerful for low expression level are particularly desirable. In statistical literature, several methods have been proposed to compare two Poisson means (rates). In this paper, we compare these methods by using simulated and real RNA-seq data.
Through simulation study and real data analysis, we find that the Wald test with the data being log-transformed is more powerful than other methods, including the likelihood ratio test, which has similar power as the variance stabilizing transformation test; both are more powerful than the conditional exact test and Fisher exact test.
When the count data in RNA-seq can be reasonably modelled as Poisson distribution, the Wald-Log test is more powerful and should be used to detect the differentially expressed genes.
PMCID: PMC3287564  PMID: 22784615
3.  Kernelized partial least squares for feature reduction and classification of gene microarray data 
BMC Systems Biology  2011;5(Suppl 3):S13.
The primary objectives of this paper are: 1.) to apply Statistical Learning Theory (SLT), specifically Partial Least Squares (PLS) and Kernelized PLS (K-PLS), to the universal "feature-rich/case-poor" (also known as "large p small n", or "high-dimension, low-sample size") microarray problem by eliminating those features (or probes) that do not contribute to the "best" chromosome bio-markers for lung cancer, and 2.) quantitatively measure and verify (by an independent means) the efficacy of this PLS process. A secondary objective is to integrate these significant improvements in diagnostic and prognostic biomedical applications into the clinical research arena. That is, to devise a framework for converting SLT results into direct, useful clinical information for patient care or pharmaceutical research. We, therefore, propose and preliminarily evaluate, a process whereby PLS, K-PLS, and Support Vector Machines (SVM) may be integrated with the accepted and well understood traditional biostatistical "gold standard", Cox Proportional Hazard model and Kaplan-Meier survival analysis methods. Specifically, this new combination will be illustrated with both PLS and Kaplan-Meier followed by PLS and Cox Hazard Ratios (CHR) and can be easily extended for both the K-PLS and SVM paradigms. Finally, these previously described processes are contained in the Fine Feature Selection (FFS) component of our overall feature reduction/evaluation process, which consists of the following components: 1.) coarse feature reduction, 2.) fine feature selection and 3.) classification (as described in this paper) and prediction.
Our results for PLS and K-PLS showed that these techniques, as part of our overall feature reduction process, performed well on noisy microarray data. The best performance was a good 0.794 Area Under a Receiver Operating Characteristic (ROC) Curve (AUC) for classification of recurrence prior to or after 36 months and a strong 0.869 AUC for classification of recurrence prior to or after 60 months. Kaplan-Meier curves for the classification groups were clearly separated, with p-values below 4.5e-12 for both 36 and 60 months. CHRs were also good, with ratios of 2.846341 (36 months) and 3.996732 (60 months).
SLT techniques such as PLS and K-PLS can effectively address difficult problems with analyzing biomedical data such as microarrays. The combinations with established biostatistical techniques demonstrated in this paper allow these methods to move from academic research and into clinical practice.
PMCID: PMC3287568  PMID: 22784619
4.  BindN+ for accurate prediction of DNA and RNA-binding residues from protein sequence features 
BMC Systems Biology  2010;4(Suppl 1):S3.
Understanding how biomolecules interact is a major task of systems biology. To model protein-nucleic acid interactions, it is important to identify the DNA or RNA-binding residues in proteins. Protein sequence features, including the biochemical property of amino acids and evolutionary information in terms of position-specific scoring matrix (PSSM), have been used for DNA or RNA-binding site prediction. However, PSSM is rather designed for PSI-BLAST searches, and it may not contain all the evolutionary information for modelling DNA or RNA-binding sites in protein sequences.
In the present study, several new descriptors of evolutionary information have been developed and evaluated for sequence-based prediction of DNA and RNA-binding residues using support vector machines (SVMs). The new descriptors were shown to improve classifier performance. Interestingly, the best classifiers were obtained by combining the new descriptors and PSSM, suggesting that they captured different aspects of evolutionary information for DNA and RNA-binding site prediction. The SVM classifiers achieved 77.3% sensitivity and 79.3% specificity for prediction of DNA-binding residues, and 71.6% sensitivity and 78.7% specificity for RNA-binding site prediction.
Predictions at this level of accuracy may provide useful information for modelling protein-nucleic acid interactions in systems biology studies. We have thus developed a web-based tool called BindN+ ( to make the SVM classifiers accessible to the research community.
PMCID: PMC2880409  PMID: 20522253
5.  A conceptual cellular interaction model of left ventricular remodelling post-MI: dynamic network with exit-entry competition strategy 
BMC Systems Biology  2010;4(Suppl 1):S5.
Progressive remodelling of the left ventricle (LV) following myocardial infarction (MI) is an outcome of spatial-temporal cellular interactions among different cell types that leads to heart failure for a significant number of patients. Cellular populations demonstrate temporal profiles of flux post-MI. However, little is known about the relationship between cell populations and the interaction strength among cells post-MI. The objective of this study was to establish a conceptual cellular interaction model based on a recently established graph network to describe the interaction between two types of cells.
We performed stability analysis to investigate the effects of the interaction strengths, the initial status, and the number of links between cells on the cellular population in the dynamic network. Our analysis generated a set of conditions on interaction strength, structure of the network, and initial status of the network to predict the evolutionary profiles of the network. Computer simulations of our conceptual model verified our analysis.
Our study introduces a dynamic network to model cellular interactions between two different cell types which can be used to model the cellular population changes post-MI. The results on stability analysis can be used as a tool to predict the responses of particular cell populations.
PMCID: PMC2880411  PMID: 20522255
6.  3D Protein structure prediction with genetic tabu search algorithm 
BMC Systems Biology  2010;4(Suppl 1):S6.
Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task.
In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods.
The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively.
PMCID: PMC2880412  PMID: 20522256

Results 1-6 (6)