PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (63)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  Identification of a chronic obstructive pulmonary disease genetic determinant that regulates HHIP 
Human Molecular Genetics  2011;21(6):1325-1335.
Multiple intergenic single-nucleotide polymorphisms (SNPs) near hedgehog interacting protein (HHIP) on chromosome 4q31 have been strongly associated with pulmonary function levels and moderate-to-severe chronic obstructive pulmonary disease (COPD). However, whether the effects of variants in this region are related to HHIP or another gene has not been proven. We confirmed genetic association of SNPs in the 4q31 COPD genome-wide association study (GWAS) region in a Polish cohort containing severe COPD cases and healthy smoking controls (P = 0.001 to 0.002). We found that HHIP expression at both mRNA and protein levels is reduced in COPD lung tissues. We identified a genomic region located ∼85 kb upstream of HHIP which contains a subset of associated SNPs, interacts with the HHIP promoter through a chromatin loop and functions as an HHIP enhancer. The COPD risk haplotype of two SNPs within this enhancer region (rs6537296A and rs1542725C) was associated with statistically significant reductions in HHIP promoter activity. Moreover, rs1542725 demonstrates differential binding to the transcription factor Sp3; the COPD-associated allele exhibits increased Sp3 binding, which is consistent with Sp3's usual function as a transcriptional repressor. Thus, increased Sp3 binding at a functional SNP within the chromosome 4q31 COPD GWAS locus leads to reduced HHIP expression and increased susceptibility to COPD through distal transcriptional regulation. Together, our findings reveal one mechanism through which SNPs upstream of the HHIP gene modulate the expression of HHIP and functionally implicate reduced HHIP gene expression in the pathogenesis of COPD.
doi:10.1093/hmg/ddr569
PMCID: PMC3284120  PMID: 22140090
2.  Complete Genome Sequence of Goose Tembusu Virus, Isolated from Jiangnan White Geese in Jiangsu, China 
Genome Announcements  2013;1(2):e00236-12.
Avian tembusu virus (TMUV), which was first identified in eastern China, is an emerging virus causing serious economic losses in the Chinese poultry industry. Here, we report the complete genome sequence of goose tembusu virus strain JS804, isolated from Jiangnan white geese with severe neurological signs. The genome of JS804 is 10,990 nucleotides (nt) in length and contains a single open reading frame encoding a putative polyprotein of 3,425 amino acids. Research of the whole sequence of tembusu virus will help us to understand further the molecular and evolutionary characteristics and pathogenesis of this virus.
doi:10.1128/genomeA.00236-12
PMCID: PMC3593326
3.  Prediction of S-Glutathionylation Sites Based on Protein Sequences 
PLoS ONE  2013;8(2):e55512.
S-glutathionylation, the reversible formation of mixed disulfides between glutathione(GSH) and cysteine residues in proteins, is a specific form of post-translational modification that plays important roles in various biological processes, including signal transduction, redox homeostasis, and metabolism inside cells. Experimentally identifying S-glutathionylation sites is labor-intensive and time consuming, whereas bioinformatics methods provide an alternative way to this problem by predicting S-glutathionylation sites in silico. The bioinformatics approaches give not only candidate sites for further experimental verification but also bio-chemical insights into the mechanism of S-glutathionylation. In this paper, we firstly collect experimentally determined S-glutathionylated proteins and their corresponding modification sites from the literature, and then propose a new method for predicting S-glutathionylation sites by employing machine learning methods based on protein sequence data. Promising results are obtained by our method with an AUC (area under ROC curve) score of 0.879 in 5-fold cross-validation, which demonstrates the predictive power of our proposed method. The datasets used in this work are available at http://csb.shu.edu.cn/SGDB.
doi:10.1371/journal.pone.0055512
PMCID: PMC3572087  PMID: 23418443
4.  Self-Renewal and Differentiation Capacity of Urine-Derived Stem Cells after Urine Preservation for 24 Hours 
PLoS ONE  2013;8(1):e53980.
Despite successful approaches to preserve organs, tissues, and isolated cells, the maintenance of stem cell viability and function in body fluids during storage for cell distribution and transportation remains unexplored. The aim of this study was to characterize urine-derived stem cells (USCs) after optimal preservation of urine specimens for up to 24 hours. A total of 415 urine specimens were collected from 12 healthy men (age range 20–54 years old). About 6×104 cells shed off from the urinary tract system in 24 hours. At least 100 USC clones were obtained from the stored urine specimens after 24 hours and maintained similar biological features to fresh USCs. The stored USCs had a “rice grain” shape in primary culture, and expressed mesenchymal stem cell surface markers, high telomerase activity, and normal karyotypes. Importantly, the preserved cells retained bipotent differentiation capacity. Differentiated USCs expressed myogenic specific proteins and contractile function when exposed to myogenic differentiation medium, and they expressed urothelial cell-specific markers and barrier function when exposed to urothelial differentiation medium. These data demonstrated that up to 75% of fresh USCs can be safely persevered in urine for 24 hours and that these cells stored in urine retain their original stem cell properties, indicating that preserved USCs could be available for potential use in cell-based therapy or clinical diagnosis.
doi:10.1371/journal.pone.0053980
PMCID: PMC3548815  PMID: 23349776
5.  A Computational model for compressed sensing RNAi cellular screening 
BMC Bioinformatics  2012;13:337.
Background
RNA interference (RNAi) becomes an increasingly important and effective genetic tool to study the function of target genes by suppressing specific genes of interest. This system approach helps identify signaling pathways and cellular phase types by tracking intensity and/or morphological changes of cells. The traditional RNAi screening scheme, in which one siRNA is designed to knockdown one specific mRNA target, needs a large library of siRNAs and turns out to be time-consuming and expensive.
Results
In this paper, we propose a conceptual model, called compressed sensing RNAi (csRNAi), which employs a unique combination of group of small interfering RNAs (siRNAs) to knockdown a much larger size of genes. This strategy is based on the fact that one gene can be partially bound with several small interfering RNAs (siRNAs) and conversely, one siRNA can bind to a few genes with distinct binding affinity. This model constructs a multi-to-multi correspondence between siRNAs and their targets, with siRNAs much fewer than mRNA targets, compared with the conventional scheme. Mathematically this problem involves an underdetermined system of equations (linear or nonlinear), which is ill-posed in general. However, the recently developed compressed sensing (CS) theory can solve this problem. We present a mathematical model to describe the csRNAi system based on both CS theory and biological concerns. To build this model, we first search nucleotide motifs in a target gene set. Then we propose a machine learning based method to find the effective siRNAs with novel features, such as image features and speech features to describe an siRNA sequence. Numerical simulations show that we can reduce the siRNA library to one third of that in the conventional scheme. In addition, the features to describe siRNAs outperform the existing ones substantially.
Conclusions
This csRNAi system is very promising in saving both time and cost for large-scale RNAi screening experiments which may benefit the biological research with respect to cellular processes and pathways.
doi:10.1186/1471-2105-13-337
PMCID: PMC3544734  PMID: 23270311
6.  Multi-scale agent-based brain cancer modeling and prediction of TKI treatment response: Incorporating EGFR signaling pathway and angiogenesis 
BMC Bioinformatics  2012;13:218.
Background
The epidermal growth factor receptor (EGFR) signaling pathway and angiogenesis in brain cancer act as an engine for tumor initiation, expansion and response to therapy. Since the existing literature does not have any models that investigate the impact of both angiogenesis and molecular signaling pathways on treatment, we propose a novel multi-scale, agent-based computational model that includes both angiogenesis and EGFR modules to study the response of brain cancer under tyrosine kinase inhibitors (TKIs) treatment.
Results
The novel angiogenesis module integrated into the agent-based tumor model is based on a set of reaction–diffusion equations that describe the spatio-temporal evolution of the distributions of micro-environmental factors such as glucose, oxygen, TGFα, VEGF and fibronectin. These molecular species regulate tumor growth during angiogenesis. Each tumor cell is equipped with an EGFR signaling pathway linked to a cell-cycle pathway to determine its phenotype. EGFR TKIs are delivered through the blood vessels of tumor microvasculature and the response to treatment is studied.
Conclusions
Our simulations demonstrated that entire tumor growth profile is a collective behaviour of cells regulated by the EGFR signaling pathway and the cell cycle. We also found that angiogenesis has a dual effect under TKI treatment: on one hand, through neo-vasculature TKIs are delivered to decrease tumor invasion; on the other hand, the neo-vasculature can transport glucose and oxygen to tumor cells to maintain their metabolism, which results in an increase of cell survival rate in the late simulation stages.
doi:10.1186/1471-2105-13-218
PMCID: PMC3487967  PMID: 22935054
Multi-scale; Agent-based modeling; EGFR signaling pathway; Angiogenesis; TKI treatment
7.  Computer-assisted lip diagnosis on traditional Chinese medicine using multi-class support vector machines 
Background
In Traditional Chinese Medicine (TCM), the lip diagnosis is an important diagnostic method which has a long history and is applied widely. The lip color of a person is considered as a symptom to reflect the physical conditions of organs in the body. However, the traditional diagnostic approach is mainly based on observation by doctor’s nude eyes, which is non-quantitative and subjective. The non-quantitative approach largely depends on the doctor’s experience and influences accurate the diagnosis and treatment in TCM. Developing new quantification methods to identify the exact syndrome based on the lip diagnosis of TCM becomes urgent and important. In this paper, we design a computer-assisted classification model to provide an automatic and quantitative approach for the diagnosis of TCM based on the lip images.
Methods
A computer-assisted classification method is designed and applied for syndrome diagnosis based on the lip images. Our purpose is to classify the lip images into four groups: deep-red, red, purple and pale. The proposed scheme consists of four steps including the lip image preprocessing, image feature extraction, feature selection and classification. The extracted 84 features contain the lip color space component, texture and moment features. Feature subset selection is performed by using SVM-RFE (Support Vector Machine with recursive feature elimination), mRMR (minimum Redundancy Maximum Relevance) and IG (information gain). Classification model is constructed based on the collected lip image features using multi-class SVM and Weighted multi-class SVM (WSVM). In addition, we compare SVM with k-nearest neighbor (kNN) algorithm, Multiple Asymmetric Partial Least Squares Classifier (MAPLSC) and Naïve Bayes for the diagnosis performance comparison. All displayed faces image have obtained consent from the participants.
Results
A total of 257 lip images are collected for the modeling of lip diagnosis in TCM. The feature selection method SVM-RFE selects 9 important features which are composed of 5 color component features, 3 texture features and 1 moment feature. SVM, MAPLSC, Naïve Bayes, kNN showed better classification results based on the 9 selected features than the results obtained from all the 84 features. The total classification accuracy of the five methods is 84%, 81%, 79% and 81%, 77%, respectively. So SVM achieves the best classification accuracy. The classification accuracy of SVM is 81%, 71%, 89% and 86% on Deep-red, Pale Purple, Red and lip image models, respectively. While with the feature selection algorithm mRMR and IG, the total classification accuracy of WSVM achieves the best classification accuracy. Therefore, the results show that the system can achieve best classification accuracy combined with SVM classifiers and SVM-REF feature selection algorithm.
Conclusions
A diagnostic system is proposed, which firstly segments the lip from the original facial image based on the Chan-Vese level set model and Otsu method, then extracts three kinds of features (color space features, Haralick co-occurrence features and Zernike moment features) on the lip image. Meanwhile, SVM-REF is adopted to select the optimal features. Finally, SVM is applied to classify the four classes. Besides, we also compare different feature selection algorithms and classifiers to verify our system. So the developed automatic and quantitative diagnosis system of TCM is effective to distinguish four lip image classes: Deep-red, Purple, Red and Pale. This study puts forward a new method and idea for the quantitative examination on lip diagnosis of TCM, as well as provides a template for objective diagnosis in TCM.
doi:10.1186/1472-6882-12-127
PMCID: PMC3522569  PMID: 22898352
Traditional chinese medicine; Computer-assisted lip diagnosis; Image analysis; Feature selection; Support vector machine
8.  Cancer stem cell, niche and EGFR decide tumor development and treatment response: A bio-computational Simulation Study 
Journal of theoretical biology  2010;269(1):138-149.
Recent research in cancer biology has suggested the hypothesis that tumors are initiated and driven by a small group of cancer stem cells (CSCs). Furthermore, cancer stem cell niches have been found to be essential in determining fates of CSCs, and several signaling pathways have been proven to play a crucial role in cellular behavior, which could be two important factors in cancer development. To better understand the progression, heterogeneity and treatment response of breast cancer, especially in the context of CSCs, we propose a mathematical model based on the cell compartment method. In this model, three compartments of cellular subpopulations are constructed: CSCs, progenitor cells (PCs), and terminal differentiated cells (TCs). Moreover, 1) the cancer stem cell niche is, considered by modeling its effect on division patterns (symmetric or asymmetric) of CSCs, and 2) the EGFR signaling pathway is integrated by modeling its role in cell proliferation, apoptosis. Our simulation results indicate that 1) a higher probability for symmetric division of CSC may result in a faster expansion of tumor population, and for a larger number of niches, the tumor grows at a slower rate, but the final tumor volume is larger; 2) higher EGFR expression correlates to tumors with larger volumes while a saturation function is observed, and 3) treatments that inhibit tyrosine kinase activity of EGFR may not only repress the tumor volume, but also decrease the CSCs percentages by shifting CSCs from symmetric divisions to asymmetric divisions. These findings suggest that therapies should be designed to effectively control or eliminate the symmetric division of CSCs and to reduce or destroy the CSC niches.
doi:10.1016/j.jtbi.2010.10.016
PMCID: PMC3153880  PMID: 20969880
mathematical model; compartment method; signaling pathway; breast cancer; tyrosine kinase inhibitors
9.  Developing a multiscale, multi-resolution agent-based brain tumor model by graphics processing units 
Multiscale agent-based modeling (MABM) has been widely used to simulate Glioblastoma Multiforme (GBM) and its progression. At the intracellular level, the MABM approach employs a system of ordinary differential equations to describe quantitatively specific intracellular molecular pathways that determine phenotypic switches among cells (e.g. from migration to proliferation and vice versa). At the intercellular level, MABM describes cell-cell interactions by a discrete module. At the tissue level, partial differential equations are employed to model the diffusion of chemoattractants, which are the input factors of the intracellular molecular pathway. Moreover, multiscale analysis makes it possible to explore the molecules that play important roles in determining the cellular phenotypic switches that in turn drive the whole GBM expansion. However, owing to limited computational resources, MABM is currently a theoretical biological model that uses relatively coarse grids to simulate a few cancer cells in a small slice of brain cancer tissue. In order to improve this theoretical model to simulate and predict actual GBM cancer progression in real time, a graphics processing unit (GPU)-based parallel computing algorithm was developed and combined with the multi-resolution design to speed up the MABM. The simulated results demonstrated that the GPU-based, multi-resolution and multiscale approach can accelerate the previous MABM around 30-fold with relatively fine grids in a large extracellular matrix. Therefore, the new model has great potential for simulating and predicting real-time GBM progression, if real experimental data are incorporated.
doi:10.1186/1742-4682-8-46
PMCID: PMC3312859  PMID: 22176732
10.  Clld7, a candidate tumor suppressor on chromosome 13q14, regulates pathways of DNA damage/repair and apoptosis 
Cancer research  2010;70(22):9434-9443.
Chronic lymphocytic leukemia deletion gene 7 (Clld7) is a candidate tumor suppressor on chromosome 13q14. Clld7 encodes an evolutionarily conserved protein that contains an RCC1 domain plus broad complex, tramtrack, bric-a-brac (BTB) and POZ domains. In this study, we investigated the biological functions of Clld7 protein in inducible osteosarcoma cell lines. Clld7 induction inhibited cell growth, decreased cell viability, and increased gamma-H2AX staining under conditions of caspase inhibition, indicating activation of the DNA damage/repair pathway. Real-time PCR analysis in tumor cells and normal human epithelial cells revealed Clld7 target genes that regulate DNA repair responses. Furthermore, depletion of Clld7 in normal human epithelial cells conferred resistance to apoptosis triggered by DNA damage. Taken together, the biological actions of Clld7 are consistent with those of a tumor suppressor.
doi:10.1158/0008-5472.CAN-10-1960
PMCID: PMC2982930  PMID: 20926398
tumor suppressor; DNA damage/repair pathway; 13q14; primary human keratinocyte; apoptosis
11.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data 
Bioinformatics  2010;26(21):2744-2751.
Motivation: High-throughput protein interaction data, with ever-increasing volume, are becoming the foundation of many biological discoveries, and thus high-quality protein–protein interaction (PPI) maps are critical for a deeper understanding of cellular processes. However, the unreliability and paucity of current available PPI data are key obstacles to the subsequent quantitative studies. It is therefore highly desirable to develop an approach to deal with these issues from the computational perspective. Most previous works for assessing and predicting protein interactions either need supporting evidences from multiple information resources or are severely impacted by the sparseness of PPI networks.
Results: We developed a robust manifold embedding technique for assessing the reliability of interactions and predicting new interactions, which purely utilizes the topological information of PPI networks and can work on a sparse input protein interactome without requiring additional information types. After transforming a given PPI network into a low-dimensional metric space using manifold embedding based on isometric feature mapping (ISOMAP), the problem of assessing and predicting protein interactions is recasted into the form of measuring similarity between points of its metric space. Then a reliability index, a likelihood indicating the interaction of two proteins, is assigned to each protein pair in the PPI networks based on the similarity between the points in the embedded space. Validation of the proposed method is performed with extensive experiments on densely connected and sparse PPI network of yeast, respectively. Results demonstrate that the interactions ranked top by our method have high-functional homogeneity and localization coherence, especially our method is very efficient for large sparse PPI network with which the traditional algorithms fail. Therefore, the proposed algorithm is a much more promising method to detect both false positive and false negative interactions in PPI networks.
Availability: MATLAB code implementing the algorithm is available from the web site http://home.ustc.edu.cn/∼yzh33108/Manifold.htm.
Contact: dshuang@iim.ac.cn
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq510
PMCID: PMC3025743  PMID: 20817744
12.  A computational framework for studying neuron morphology from in vitro high content neuron-based screening 
Journal of neuroscience methods  2010;190(2):299-309.
High content neuron image processing is considered as an important method for quantitative neurobiological studies. The main goal of analysis in this paper is to provide automatic image processing approaches to process neuron images for studying neuron mechanism in high content screening. In the nuclei channel, all nuclei are segmented and detected by applying the gradient vector field based watershed. Then the neuronal nuclei are selected based on the soma region detected in neurite channel. In neurite images, we propose a novel neurite centerline extraction approach using the improved line-pixel detection technique. The proposed neurite tracing method can detect the curvilinear structure more accurately compared with the current existing methods. An interface called NeuriteIQ based on the proposed algorithms is developed finally for better application in high content screening.
doi:10.1016/j.jneumeth.2010.05.012
PMCID: PMC3184395  PMID: 20580743
High content screening; Microscopy image; Nuclei segmentation; Neurite outgrowth; Line-pixel detection; Branch area
13.  Image-Based Chemical Screening Identifies Drug Efflux Inhibitors In Lung Cancer Cells 
Cancer research  2010;70(19):7723-7733.
Cancer cells with active drug-efflux capability are multidrug resistant and pose a significant obstacle for the efficacy of chemotherapy. Moreover, recent evidence suggests that high drug-efflux cancer cells (HDECCs) may be selectively enriched with stem-like cancer cells, which are believed to be the cause for tumor initiation and recurrence. There is a great need for therapeutic reagents that are capable of eliminating HDECCs. We developed an image-based high-content screening (HCS) system to specifically identify and analyze the HDECC population in lung cancer cells. Using the system, we screened 1,280 pharmacologically active compounds which identified twelve potent HDECC inhibitors. It is shown that these inhibitors are able to overcome MDR and sensitize HDECCs to chemotherapeutic drugs, or directly reduce the tumorigenicity of lung cancer cells possibly by affecting stem-like cancer cells. The HCS system we established provides a new approach for identifying therapeutic reagents overcoming MDR. The compounds identified by the screening may potentially be used as potential adjuvant to improve the efficacy of chemotherapeutic drugs.
doi:10.1158/0008-5472.CAN-09-4360
PMCID: PMC2948619  PMID: 20841476
high drug-efflux cancer cells; multidrug resistance; high content screening; image-based assay
14.  A Time-Series Method for Automated Measurement of Changes in Mitotic and Interphase Duration from Time-Lapse Movies 
PLoS ONE  2011;6(9):e25511.
Background
Automated time-lapse microscopy can visualize proliferation of large numbers of individual cells, enabling accurate measurement of the frequency of cell division and the duration of interphase and mitosis. However, extraction of quantitative information by manual inspection of time-lapse movies is too time-consuming to be useful for analysis of large experiments.
Methodology/Principal Findings
Here we present an automated time-series approach that can measure changes in the duration of mitosis and interphase in individual cells expressing fluorescent histone 2B. The approach requires analysis of only 2 features, nuclear area and average intensity. Compared to supervised learning approaches, this method reduces processing time and does not require generation of training data sets. We demonstrate that this method is as sensitive as manual analysis in identifying small changes in interphase or mitotic duration induced by drug or siRNA treatment.
Conclusions/Significance
This approach should facilitate automated analysis of high-throughput time-lapse data sets to identify small molecules or gene products that influence timing of cell division.
doi:10.1371/journal.pone.0025511
PMCID: PMC3180452  PMID: 21966537
15.  ONLINE THREE-DIMENSIONAL DENDRITIC SPINES MOPHOLOGICAL CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING 
Recent studies on neuron imaging show that there is a strong relationship between the functional properties of a neuron and its morphology, especially its dendritic spine structures. However, most of the current methods for morphological spine classification only concern features in two-dimensional (2D) space, which consequently decreases the accuracy of dendritic spine analysis. In this paper, we propose a semi-supervised learning (SSL) framework, in which spine phenotypes in three-dimensional (3D) space are considered. With training only on a few pre-classified inputs, the rest of the spines can be identified effectively. We also derived a new scheme using an affinity matrix between features to further improve the accuracy. Our experimental results indicate that a small training dataset is sufficient to classify detected dendritic spines.
doi:10.1109/ISBI.2009.5193228
PMCID: PMC3171508  PMID: 21922077
dendritic spine; semi-supervised learning; morphological spine classification
16.  An Image Driven Systems Biology Approach for Neurodegenerative Disease Studies in the TSC-mTOR Pathway 
In this brief paper we present an overview of the TSC-mTOR pathway and its importance in neurodegenerative disease (ND). We illustrate the influence of ND on dendritic spine morphology. Then we discuss some details of functional gene networks (FGN) and use this information to propose an image driven systems biology approach for the construction of a FGN for ND. We conclude on its importance and the prospective outcome of our study.
doi:10.1109/LISSA.2009.4906703
PMCID: PMC3171509  PMID: 21922078
17.  An enhanced Petri-net model to predict synergistic effects of pairwise drug combinations from gene microarray data 
Bioinformatics  2011;27(13):i310-i316.
Motivation: Prediction of synergistic effects of drug combinations has traditionally been relied on phenotypic response data. However, such methods cannot be used to identify molecular signaling mechanisms of synergistic drug combinations. In this article, we propose an enhanced Petri-Net (EPN) model to recognize the synergistic effects of drug combinations from the molecular response profiles, i.e. drug-treated microarray data.
Methods: We addressed the downstream signaling network of the targets for the two individual drugs used in the pairwise combinations and applied EPN to the identified targeted signaling network. In EPN, drugs and signaling molecules are assigned to different types of places, while drug doses and molecular expressions are denoted by color tokens. The changes of molecular expressions caused by treatments of drugs are simulated by two actions of EPN: firing and blasting. Firing is to transit the drug and molecule tokens from one node or place to another, and blasting is to reduce the number of molecule tokens by drug tokens in a molecule node. The goal of EPN is to mediate the state characterized by control condition without any treatment to that of treatment and to depict the drug effects on molecules by the drug tokens.
Results: We applied EPN to our generated pairwise drug combination microarray data. The synergistic predictions using EPN are consistent with those predicted using phenotypic response data. The molecules responsible for the synergistic effects with their associated feedback loops display the mechanisms of synergism.
Availability: The software implemented in Python 2.7 programming language is available from request.
Contact: stwong@tmhs.org
doi:10.1093/bioinformatics/btr202
PMCID: PMC3117391  PMID: 21685086
18.  NSMAP: A method for spliced isoforms identification and quantification from RNA-Seq 
BMC Bioinformatics  2011;12:162.
Background
The development of techniques for sequencing the messenger RNA (RNA-Seq) enables it to study the biological mechanisms such as alternative splicing and gene expression regulation more deeply and accurately. Most existing methods employ RNA-Seq to quantify the expression levels of already annotated isoforms from the reference genome. However, the current reference genome is very incomplete due to the complexity of the transcriptome which hiders the comprehensive investigation of transcriptome using RNA-Seq. Novel study on isoform inference and estimation purely from RNA-Seq without annotation information is desirable.
Results
A Nonnegativity and Sparsity constrained Maximum APosteriori (NSMAP) model has been proposed to estimate the expression levels of isoforms from RNA-Seq data without the annotation information. In contrast to previous methods, NSMAP performs identification of the structures of expressed isoforms and estimation of the expression levels of those expressed isoforms simultaneously, which enables better identification of isoforms. In the simulations parameterized by two real RNA-Seq data sets, more than 77% expressed isoforms are correctly identified and quantified. Then, we apply NSMAP on two RNA-Seq data sets of myelodysplastic syndromes (MDS) samples and one normal sample in order to identify differentially expressed known and novel isoforms in MDS disease.
Conclusions
NSMAP provides a good strategy to identify and quantify novel isoforms without the knowledge of annotated reference genome which can further realize the potential of RNA-Seq technique in transcriptome analysis. NSMAP package is freely available at https://sites.google.com/site/nsmapforrnaseq.
doi:10.1186/1471-2105-12-162
PMCID: PMC3113944  PMID: 21575225
19.  Bayesian Peptide Peak Detection for High Resolution TOF Mass Spectrometry 
In this paper, we address the issue of peptide ion peak detection for high resolution time-of-flight (TOF) mass spectrometry (MS) data. A novel Bayesian peptide ion peak detection method is proposed for TOF data with resolution of 10 000–15 000 full width at half-maximum (FWHW). MS spectra exhibit distinct characteristics at this resolution, which are captured in a novel parametric model. Based on the proposed parametric model, a Bayesian peak detection algorithm based on Markov chain Monte Carlo (MCMC) sampling is developed. The proposed algorithm is tested on both simulated and real datasets. The results show a significant improvement in detection performance over a commonly employed method. The results also agree with expert’s visual inspection. Moreover, better detection consistency is achieved across MS datasets from patients with identical pathological condition.
doi:10.1109/TSP.2010.2065226
PMCID: PMC3085289  PMID: 21544266
Bayesian methods; Markov chain Monte Carlo; mass spectrometry; peptide peak detection; time-of-flight
20.  Drug Inhibition Profile Prediction for NFκB Pathway in Multiple Myeloma 
PLoS ONE  2011;6(3):e14750.
Nuclear factor κB (NFκB) activation plays a crucial role in anti-apoptotic responses in response to the apoptotic signaling during tumor necrosis factor (TNFα) stimulation in Multiple Myeloma (MM). Although several drugs have been found effective for the treatment of MM by mainly inhibiting NFκB pathway, there are not any quantitative or qualitative results of comparison assessment on inhibition effect between different drugs either used alone or in combinations. Computational modeling is becoming increasingly indispensable for applied biological research mainly because it can provide strong quantitative predicting power. In this study, a novel computational pathway modeling approach is employed to comparably assess the inhibition effects of specific drugs used alone or in combinations on the NFκB pathway in MM and to predict the potential synergistic drug combinations.
doi:10.1371/journal.pone.0014750
PMCID: PMC3051063  PMID: 21408099
21.  Integrative analysis of next generation sequencing for small non-coding RNAs and transcriptional regulation in Myelodysplastic Syndromes 
BMC Medical Genomics  2011;4:19.
Background
Myelodysplastic Syndromes (MDSS) are pre-leukemic disorders with increasing incident rates worldwide, but very limited treatment options. Little is known about small regulatory RNAs and how they contribute to pathogenesis, progression and transcriptome changes in MDS.
Methods
Patients' primary marrow cells were screened for short RNAs (RNA-seq) using next generation sequencing. Exon arrays from the same cells were used to profile gene expression and additional measures on 98 patients obtained. Integrative bioinformatics algorithms were proposed, and pathway and ontology analysis performed.
Results
In low-grade MDS, observations implied extensive post-transcriptional regulation via microRNAs (miRNA) and the recently discovered Piwi interacting RNAs (piRNA). Large expression differences were found for MDS-associated and novel miRNAs, including 48 sequences matching to miRNA star (miRNA*) motifs. The detected species were predicted to regulate disease stage specific molecular functions and pathways, including apoptosis and response to DNA damage. In high-grade MDS, results suggested extensive post-translation editing via transfer RNAs (tRNAs), providing a potential link for reduced apoptosis, a hallmark for this disease stage. Bioinformatics analysis confirmed important regulatory roles for MDS linked miRNAs and TFs, and strengthened the biological significance of miRNA*. The "RNA polymerase II promoters" were identified as the tightest controlled biological function. We suggest their control by a miRNA dominated feedback loop, which might be linked to the dramatically different miRNA amounts seen between low and high-grade MDS.
Discussion
The presented results provide novel findings that build a basis of further investigations of diagnostic biomarkers, targeted therapies and studies on MDS pathogenesis.
doi:10.1186/1755-8794-4-19
PMCID: PMC3060843  PMID: 21342535
22.  Comprehensive genetic assessment of a functional TLR9 promoter polymorphism: no replicable association with asthma or asthma-related phenotypes 
BMC Medical Genetics  2011;12:26.
Background
Prior studies suggest a role for a variant (rs5743836) in the promoter of toll-like receptor 9 (TLR9) in asthma and other inflammatory diseases. We performed detailed genetic association studies of the functional variant rs5743836 with asthma susceptibility and asthma-related phenotypes in three independent cohorts.
Methods
rs5743836 was genotyped in two family-based cohorts of children with asthma and a case-control study of adult asthmatics. Association analyses were performed using chi square, family-based and population-based testing. A luciferase assay was performed to investigate whether rs5743836 genotype influences TLR9 promoter activity.
Results
Contrary to prior reports, rs5743836 was not associated with asthma in any of the three cohorts. Marginally significant associations were found with FEV1 and FVC (p = 0.003 and p = 0.008, respectively) in one of the family-based cohorts, but these associations were not significant after correcting for multiple comparisons. Higher promoter activity of the CC genotype was demonstrated by luciferase assay, confirming the functional importance of this variant.
Conclusion
Although rs5743836 confers regulatory effects on TLR9 transcription, this variant does not appear to be an important asthma-susceptibility locus.
doi:10.1186/1471-2350-12-26
PMCID: PMC3048492  PMID: 21324137
23.  Suppression of Aurora-A oncogenic potential by c-Myc downregulation 
Experimental & Molecular Medicine  2010;42(11):759-767.
The abnormality of serine/threonine kinase Aurora-A is seen in many types of cancers. Although in physiological context it has been shown to play a vital role in cellular mitosis, how this oncogene contributes to tumorigenesis remains unclear. Here we demonstrate that Aurora-A overexpression enhances both the expression level and transcriptional activity of c-Myc. The inhibition of c-Myc expression by RNA interference significantly impaired the oncogenic potential of Aurora-A, resulting in attenuated cellular proliferation and transformation rates as well as fewer centrosomal aberrations. Furthermore, downregulation of c-Myc effectively overcame Aurora-A-induced resistance to cisplatin in esophageal cancer cells. Taken together, our results suggest an important role for c-Myc in mediating the oncogenic activity of Aurora-A, which may in turn allow for future targeting of c-Myc as a potential therapeutic strategy for tumors with Aurora-A overexpression.
doi:10.3858/emm.2010.42.11.077
PMCID: PMC2992855  PMID: 20890087
aurora kinase; neoplasms; proto-oncogene proteins c-myc; RNA interference
24.  MicroRNA-Integrated and Network-Embedded Gene Selection with Diffusion Distance 
PLoS ONE  2010;5(10):e13748.
Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways.
doi:10.1371/journal.pone.0013748
PMCID: PMC2966417  PMID: 21060785
25.  Predicting enzyme targets for cancer drugs by profiling human Metabolic reactions in NCI-60 cell lines 
BMC Bioinformatics  2010;11:501.
Background
Drugs can influence the whole metabolic system by targeting enzymes which catalyze metabolic reactions. The existence of interactions between drugs and metabolic reactions suggests a potential way to discover drug targets.
Results
In this paper, we present a computational method to predict new targets for approved anti-cancer drugs by exploring drug-reaction interactions. We construct a Drug-Reaction Network to provide a global view of drug-reaction interactions and drug-pathway interactions. The recent reconstruction of the human metabolic network and development of flux analysis approaches make it possible to predict each metabolic reaction's cell line-specific flux state based on the cell line-specific gene expressions. We first profile each reaction by its flux states in NCI-60 cancer cell lines, and then propose a kernel k-nearest neighbor model to predict related metabolic reactions and enzyme targets for approved cancer drugs. We also integrate the target structure data with reaction flux profiles to predict drug targets and the area under curves can reach 0.92.
Conclusions
The cross validations using the methods with and without metabolic network indicate that the former method is significantly better than the latter. Further experiments show the synergism of reaction flux profiles and target structure for drug target prediction. It also implies the significant contribution of metabolic network to predict drug targets. Finally, we apply our method to predict new reactions and possible enzyme targets for cancer drugs.
doi:10.1186/1471-2105-11-501
PMCID: PMC2964682  PMID: 20932284

Results 1-25 (63)