PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-12 (12)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
1.  Correction: Coevolution Analysis of HIV-1 Envelope Glycoprotein Complex 
PLoS ONE  2015;10(12):e0145974.
doi:10.1371/journal.pone.0145974
PMCID: PMC4689572  PMID: 26699336
2.  Coevolution Analysis of HIV-1 Envelope Glycoprotein Complex 
PLoS ONE  2015;10(11):e0143245.
The HIV-1 Env spike is the main protein complex that facilitates HIV-1 entry into CD4+ host cells. HIV-1 entry is a multistep process that is not yet completely understood. This process involves several protein-protein interactions between HIV-1 Env and a variety of host cell receptors along with many conformational changes within the spike. HIV-1 Env developed due to high mutation rates and plasticity escape strategies from immense immune pressure and entry inhibitors. We applied a coevolution and residue-residue contact detecting method to identify coevolution patterns within HIV-1 Env protein sequences representing all group M subtypes. We identified 424 coevolving residue pairs within HIV-1 Env. The majority of predicted pairs are residue-residue contacts and are proximal in 3D structure. Furthermore, many of the detected pairs have functional implications due to contributions in either CD4 or coreceptor binding, or variable loop, gp120-gp41, and interdomain interactions. This study provides a new dimension of information in HIV research. The identified residue couplings may not only be important in assisting gp120 and gp41 coordinate structure prediction, but also in designing new and effective entry inhibitors that incorporate mutation patterns of HIV-1 Env.
doi:10.1371/journal.pone.0143245
PMCID: PMC4651434  PMID: 26579711
5.  Systems biology analysis reveals NFAT5 as a novel biomarker and master regulator of inflammatory breast cancer 
Background
Inflammatory breast cancer (IBC) is the most rare and aggressive variant of breast cancer (BC); however, only a limited number of specific gene signatures with low generalization abilities are available and few reliable biomarkers are helpful to improve IBC classification into a molecularly distinct phenotype. We applied a network-based strategy to gain insight into master regulators (MRs) linked to IBC pathogenesis.
Methods
In-silico modeling and Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe) on IBC/non-IBC (nIBC) gene expression data (n = 197) was employed to identify novel master regulators connected to the IBC phenotype. Pathway enrichment analysis was used to characterize predicted targets of candidate genes. The expression pattern of the most significant MRs was then evaluated by immunohistochemistry (IHC) in two independent cohorts of IBCs (n = 39) and nIBCs (n = 82) and normal breast tissues (n = 15) spotted on tissue microarrays. The staining pattern of non-neoplastic mammary epithelial cells was used as a normal control.
Results
Using in-silico modeling of network-based strategy, we identified three top enriched MRs (NFAT5, CTNNB1 or β-catenin, and MGA) strongly linked to the IBC phenotype. By IHC assays, we found that IBC patients displayed a higher number of NFAT5-positive cases than nIBC (69.2% vs. 19.5%; p-value = 2.79 10-7). Accordingly, the majority of NFAT5-positive IBC samples revealed an aberrant nuclear expression in comparison with nIBC samples (70% vs. 12.5%; p-value = 0.000797). NFAT5 nuclear accumulation occurs regardless of WNT/β-catenin activated signaling in a substantial portion of IBCs, suggesting that NFAT5 pathway activation may have a relevant role in IBC pathogenesis. Accordingly, cytoplasmic NFAT5 and membranous β-catenin expression were preferentially linked to nIBC, accounting for the better prognosis of this phenotype.
Conclusions
We provide evidence that NFAT-signaling pathway activation could help to identify aggressive forms of BC and potentially be a guide to assignment of phenotype-specific therapeutic agents. The NFAT5 transcription factor might be developed into routine clinical practice as a putative biomarker of IBC phenotype.
Electronic supplementary material
The online version of this article (doi:10.1186/s12967-015-0492-2) contains supplementary material, which is available to authorized users.
doi:10.1186/s12967-015-0492-2
PMCID: PMC4438533  PMID: 25928084
Inflammatory breast cancer; Gene regulatory network; Systems biology; NFAT5; MGA; CTNNB1
6.  Model Selection Emphasises the Importance of Non-Chromosomal Information in Genetic Studies 
PLoS ONE  2015;10(1):e0117014.
Ever since the case of the missing heritability was highlighted some years ago, scientists have been investigating various possible explanations for the issue. However, none of these explanations include non-chromosomal genetic information. Here we describe explicitly how chromosomal and non-chromosomal modifiers collectively influence the heritability of a trait, in this case, the growth rate of yeast. Our results show that the non-chromosomal contribution can be large, adding another dimension to the estimation of heritability. We also discovered, combining the strength of LASSO with model selection, that the interaction of chromosomal and non-chromosomal information is essential in describing phenotypes.
doi:10.1371/journal.pone.0117014
PMCID: PMC4308103  PMID: 25626013
7.  Multiple graph regularized protein domain ranking 
BMC Bioinformatics  2012;13:307.
Background
Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods.
Results
To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods.
Conclusion
The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.
doi:10.1186/1471-2105-13-307
PMCID: PMC3583823  PMID: 23157331
8.  Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning 
Multiclass classification and feature (variable) selections are commonly encountered in many biological and medical applications. However, extending binary classification approaches to multiclass problems is not trivial. Instance-based methods such as the K nearest neighbor (KNN) can naturally extend to multiclass problems and usually perform well with unbalanced data, but suffer from the curse of dimensionality. Their performance is degraded when applied to high dimensional data. On the other hand, model-based methods such as logistic regression require the decomposition of the multiclass problem into several binary problems with one-vs.-one or one-vs.-rest schemes. Even though they can be applied to high dimensional data with L1 or Lp penalized methods, such approaches can only select independent features and the features selected with different binary problems are usually different. They also produce unbalanced classification problems with one vs. the rest scheme even if the original multiclass problem is balanced.
By combining instance-based and model-based learning, we propose an efficient learning method with integrated KNN and constrained logistic regression (KNNLog) for simultaneous multiclass classification and feature selection. Our proposed method simultaneously minimizes the intra-class distance and maximizes the interclass distance with fewer estimated parameters. It is very efficient for problems with small sample size and unbalanced classes, a case common in many real applications. In addition, our model-based feature selection methods can identify highly correlated features simultaneously avoiding the multiplicity problem due to multiple tests. The proposed method is evaluated with simulation and real data including one unbalanced microRNA dataset for leukemia and one multiclass metagenomic dataset from the Human Microbiome Project (HMP). It performs well with limited computational experiments.
doi:10.4137/EBO.S9407
PMCID: PMC3347893  PMID: 22577297
feature selection; multiclass classification; statistical learning; high-dimensional data
9.  Gene Expression Data Classification With Kernel Principal Component Analysis 
One important feature of the gene expression data is that the number of genes M far exceeds the number of samples N. Standard statistical methods do not work well when N < M. Development of new methodologies or modification of existing methodologies is needed for the analysis of the microarray data. In this paper, we propose a novel analysis procedure for classifying the gene expression data. This procedure involves dimension reduction using kernel principal component analysis (KPCA) and classification with logistic regression (discrimination). KPCA is a generalization and nonlinear version of principal component analysis. The proposed algorithm was applied to five different gene expression datasets involving human tumor samples. Comparison with other popular classification methods such as support vector machines and neural networks shows that our algorithm is very promising in classifying gene expression data.
doi:10.1155/JBB.2005.155
PMCID: PMC1184105  PMID: 16046821
10.  Data Mining in Genomics and Proteomics 
doi:10.1155/JBB.2005.63
PMCID: PMC1184057  PMID: 16046810
11.  Functional Clustering Algorithm for High-Dimensional Proteomics Data 
Clustering proteomics data is a challenging problem for any traditional clustering algorithm. Usually, the number of samples is largely smaller than the number of protein peaks. The use of a clustering algorithm which does not take into consideration the number of features of variables (here the number of peaks) is needed. An innovative hierarchical clustering algorithm may be a good approach. We propose here a new dissimilarity measure for the hierarchical clustering combined with a functional data analysis. We present a specific application of functional data analysis (FDA) to a high-throughput proteomics study. The high performance of the proposed algorithm is compared to two popular dissimilarity measures in the clustering of normal and human T-cell leukemia virus type 1 (HTLV-1)-infected patients samples.
doi:10.1155/JBB.2005.80
PMCID: PMC1184055  PMID: 16046812
12.  Postgenomics: Proteomics and Bioinformatics in Cancer Research 
Now that the human genome is completed, the characterization of the proteins encoded by the sequence remains a challenging task. The study of the complete protein complement of the genome, the “proteome,” referred to as proteomics, will be essential if new therapeutic drugs and new disease biomarkers for early diagnosis are to be developed. Research efforts are already underway to develop the technology necessary to compare the specific protein profiles of diseased versus nondiseased states. These technologies provide a wealth of information and rapidly generate large quantities of data. Processing the large amounts of data will lead to useful predictive mathematical descriptions of biological systems which will permit rapid identification of novel therapeutic targets and identification of metabolic disorders. Here, we present an overview of the current status and future research approaches in defining the cancer cell's proteome in combination with different bioinformatics and computational biology tools toward a better understanding of health and disease.
doi:10.1155/S1110724303209207
PMCID: PMC514267  PMID: 14615629

Results 1-12 (12)