Search tips
Search criteria

Results 1-25 (2370)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  Conceptualization of molecular findings by mining gene annotations 
BMC Proceedings  2013;7(Suppl 7):S2.
The Gene Ontology (GO) is an ontology representing molecular biology concepts related to genes and their products. Current annotations from the GO Consortium tend to be highly specific, and contemporary genome-scale studies often return a long list of genes of potential interest, such as genes in a cancer tumor that are differentially expressed than those found in normal tissue. It is therefore a challenging task to reveal, at a conceptual level, the major functional themes in which genes are involved. Presently, there is a need for tools capable of revealing such themes through mining and representing semantic information in an objective and quantitative manner.
In this study, we utilized the hierarchical organization of the GO to derive a more abstract representation of the major biological processes of a list of genes based on their annotations. We cast the task as follows: given a list of genes, identify non-disjoint, functionally coherent subsets, such that the functions of the genes in a subset are summarized by an informative GO term that accurately captures the semantic information of the original annotations.
We evaluated different metrics for assessing information loss when merging GO terms, and different statistical schemes to assess the functional coherence of a set of genes. We found that the best discriminative power was achieved by using a combination of the information-content-based measure as the information-loss metric, and the graph-based statistics derived from a Steiner tree connecting genes in an augmented GO graph.
Our methods provide an objective and quantitative approach to capturing the major directions of gene functions in a context-specific fashion.
PMCID: PMC4042834  PMID: 24564884
2.  Adaptive bandwidth kernel density estimation for next-generation sequencing data 
BMC Proceedings  2013;7(Suppl 7):S7.
High-throughput sequencing experiments can be viewed as measuring some sort of a "genomic signal" that may represent a biological event such as the binding of a transcription factor to the genome, locations of chromatin modifications, or even a background or control condition. Numerous algorithms have been developed to extract different kinds of information from such data. However, there has been very little focus on the reconstruction of the genomic signal itself. Such reconstructions may be useful for a variety of purposes ranging from simple visualization of the signals to sophisticated comparison of different datasets.
Here, we propose that adaptive-bandwidth kernel density estimators are well-suited for genomic signal reconstructions. This class of estimators is a natural extension of the fixed-bandwidth estimators that have been employed in several existing ChIP-Seq analysis programs.
Using a set of ChIP-Seq datasets from the ENCODE project, we show that adaptive-bandwidth estimators have greater accuracy at signal reconstruction compared to fixed-bandwidth estimators, and that they have significant advantages in terms of visualization as well. For both fixed and adaptive-bandwidth schemes, we demonstrate that smoothing parameters can be set automatically using a held-out set of tuning data. We also carry out a computational complexity analysis of the different schemes and confirm through experimentation that the necessary computations can be readily carried out on a modern workstation without any significant issues.
PMCID: PMC4043421  PMID: 24564977
3.  Multi-task feature selection in microarray data by binary integer programming 
BMC Proceedings  2013;7(Suppl 7):S5.
A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.
PMCID: PMC4043987  PMID: 24564944
4.  Better primer design for metagenomics applications by increasing taxonomic distinguishability 
BMC Proceedings  2013;7(Suppl 7):S4.
Current methods of understanding microbiome composition and structure rely on accurately estimating the number of distinct species and their relative abundance. Most of these methods require an efficient PCR whose forward and reverse primers bind well to the same, large number of identifiable species, and produce amplicons that are unique. It is therefore not surprising that currently used universal primers designed many years ago are not as efficient and fail to bind to recently cataloged species. We propose an automated general method of designing PCR primer pairs that abide by primer design rules and uses current sequence database as input. Since the method is automated, primers can be designed for targeted microbial species or updated as species are added or deleted from the database. In silico experiments and laboratory experiments confirm the efficacy of the newly designed primers for metagenomics applications.
PMCID: PMC4044206  PMID: 24564926
5.  The impact of structural diversity and parameterization on maps of the protein universe 
BMC Proceedings  2013;7(Suppl 7):S1.
Low dimensional maps of protein structure space (MPSS) provide a powerful global representation of all proteins. In such mappings structural relationships are depicted through spatial adjacency of points, each of which represents a molecule. MPSS can help in understanding the local and global topological characteristics of the structure space, as well as elucidate structure-function relationships within and between sets of proteins. A number of meta- and method-dependent parameters are involved in creating MPSS. However, at the state-of-the-art, a systematic investigation of the influence of these parameters on MPSS construction has yet to be carried out. Further, while specific cases in which MPSS out-perform pairwise distances for prediction of functional annotations have been noted, no general explanation for this phenomenon has yet been advanced.
We address the above questions within the technical context of creating MPSS by utilizing multidimensional scaling (MDS) for obtaining low-dimensional projections of structure alignment distances.
Results and conclusion
MDS is demonstrated as an effective method for construction of MPSS where related structures are co-located, even when their functional and evolutionary proximity cannot be deduced from distributions of pairwise comparisons alone. In particular, we show that MPSS exceed pairwise distance distributions in predictive capability for those annotations of shared function or origin which are characterized by a high level of structural diversity. We also determine the impact of the choice of structure alignment and MDS algorithms on the accuracy of such predictions.
PMCID: PMC4029320  PMID: 24565442
6.  BiCluE - Exact and heuristic algorithms for weighted bi-cluster editing of biomedical data 
BMC Proceedings  2013;7(Suppl 7):S9.
The explosion of biological data has dramatically reformed today's biology research. The biggest challenge to biologists and bioinformaticians is the integration and analysis of large quantity of data to provide meaningful insights. One major problem is the combined analysis of data from different types. Bi-cluster editing, as a special case of clustering, which partitions two different types of data simultaneously, might be used for several biomedical scenarios. However, the underlying algorithmic problem is NP-hard.
Here we contribute with BiCluE, a software package designed to solve the weighted bi-cluster editing problem. It implements (1) an exact algorithm based on fixed-parameter tractability and (2) a polynomial-time greedy heuristics based on solving the hardest part, edge deletions, first. We evaluated its performance on artificial graphs. Afterwards we exemplarily applied our implementation on real world biomedical data, GWAS data in this case. BiCluE generally works on any kind of data types that can be modeled as (weighted or unweighted) bipartite graphs.
To our knowledge, this is the first software package solving the weighted bi-cluster editing problem. BiCluE as well as the supplementary results are available online at
PMCID: PMC4044484  PMID: 24565035
7.  MicroRNA identification using linear dimensionality reduction with explicit feature mapping 
BMC Proceedings  2013;7(Suppl 7):S8.
microRNAs are a class of small RNAs, about 20 nt long, which regulate cellular processes in animals and plants. Identifying microRNAs is one of the most important tasks in gene regulation studies. The main features used for identifying these tiny molecules are those in hairpin secondary structures of pre-microRNA.
A new classifier is employed to identify precursor microRNAs from both pseudo hairpins and other non-coding RNAs. This classifier achieves a geometric mean Gm = 92.20% with just three features and 92.91% with seven features.
This study shows that linear dimensionality reduction combined with explicit feature mapping, namely miLDR-EM, achieves high performance in classification of microRNAs from other sequences. Also, explicitly mapping data onto a high dimensional space could be a useful alternative to kernel-based methods for large datasets with a small number of features. Moreover, we demonstrate that microRNAs can be accurately identified by just using three properties that involve minimum free energy.
PMCID: PMC4044883  PMID: 24564997
8.  A neural network approach to multi-biomarker panel discovery by high-throughput plasma proteomics profiling of breast cancer 
BMC Proceedings  2013;7(Suppl 7):S10.
In the past several years, there has been increasing interest and enthusiasm in molecular biomarkers as tools for early detection of cancer. Liquid chromatography tandem mass spectrometry (LC/MS/MS) based plasma proteomics profiling technique is a promising technology platform to study candidate protein biomarkers for early detection of cancer. Factors such as inherent variability, protein detectability limitation, and peptide discovery biases among LC/MS/MS platforms have made the classification and prediction of proteomics profiles challenging. Developing proteomics data analysis methods to identify multi-protein biomarker panels for breast cancer diagnosis based on neural networks provides hope for improving both the sensitivity and the specificity of candidate cancer biomarkers for early detection.
In our previous method, we developed a Feed Forward Neural Network-based method to build the classifier for plasma samples of breast cancer and then applied the classifier to predict blind dataset of breast cancer. However, the optimal combination C* in our previous method was actually determined by applying the trained FFNN on the testing set with the combination. Therefore, in this paper, we applied a three way data split to the Feed Forward Neural Network for training, validation and testing based. We found that the prediction performance of the FFNN model based on the three way data split outperforms our previous method and the prediction performance is improved from (AUC = 0.8706, precision = 82.5%, accuracy = 82.5%, sensitivity = 82.5%, specificity = 82.5% for the testing set) to (AUC = 0.895, precision = 86.84%, accuracy = 85%, sensitivity = 82.5%, specificity = 87.5% for the testing set).
Further pathway analysis showed that the top three five-marker panels are associated with complement and coagulation cascades, signaling, activation, and hemostasis, which are consistent with previous findings. We believe the new approach is a better solution for multi-biomarker panel discovery and it can be applied to other clinical proteomics.
PMCID: PMC4044889  PMID: 24565503
9.  Highly precise protein-protein interaction prediction based on consensus between template-based and de novo docking methods 
BMC Proceedings  2013;7(Suppl 7):S6.
Elucidation of protein-protein interaction (PPI) networks is important for understanding disease mechanisms and for drug discovery. Tertiary-structure-based in silico PPI prediction methods have been developed with two typical approaches: a method based on template matching with known protein structures and a method based on de novo protein docking. However, the template-based method has a narrow applicable range because of its use of template information, and the de novo docking based method does not have good prediction performance. In addition, both of these in silico prediction methods have insufficient precision, and require validation of the predicted PPIs by biological experiments, leading to considerable expenditure; therefore, PPI prediction methods with greater precision are needed.
We have proposed a new structure-based PPI prediction method by combining template-based prediction and de novo docking prediction. When we applied the method to the human apoptosis signaling pathway, we obtained a precision value of 0.333, which is higher than that achieved using conventional methods (0.231 for PRISM, a template-based method, and 0.145 for MEGADOCK, a non-template-based method), while maintaining an F-measure value (0.285) comparable to that obtained using conventional methods (0.296 for PRISM, and 0.220 for MEGADOCK).
Our consensus method successfully predicted a PPI network with greater precision than conventional template/non-template methods, which may thus reduce the cost of validation by laboratory experiments for confirming novel PPIs from predicted PPIs. Therefore, our method may serve as an aid for promoting interactome analysis.
PMCID: PMC4044902  PMID: 24564962
10.  A new approach to enhance the performance of decision tree for classifying gene expression data 
BMC Proceedings  2013;7(Suppl 7):S3.
Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset.
By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature.
We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.
PMCID: PMC4044984  PMID: 24564916
13.  Study of the improved Sf9 transient gene expression process 
BMC Proceedings  2013;7(Suppl 6):P19.
PMCID: PMC3980246
17.  Use of microcarriers in Mobius® CellReady bioreactors to support growth of adherent cells 
BMC Proceedings  2013;7(Suppl 6):P95.
The Mobius® CellReady bioreactor product platform incorporates novel disposable technologies that provide optimal performance for suspension mammalian cell culture. Here we show the utility of EMD Millipore's 3L and 50L CellReady single use bioreactors for the cultivation of adherent mammalian cells on microcarriers. Cytodex3® and Solohill® collagen microcarriers were first tested in a mixing study to assess feasibility. We evaluated the normalized mixing speed required in the 3L and 50L to achieve a suspension of the microcarriers and enable growth of the cells.
PMCID: PMC3980274

Results 1-25 (2370)