Search tips
Search criteria

Results 1-25 (28)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  Evaluation of Protein Dihedral Angle Prediction Methods 
PLoS ONE  2014;9(8):e105667.
Tertiary structure prediction of a protein from its amino acid sequence is one of the major challenges in the field of bioinformatics. Hierarchical approach is one of the persuasive techniques used for predicting protein tertiary structure, especially in the absence of homologous protein structures. In hierarchical approach, intermediate states are predicted like secondary structure, dihedral angles, Cα-Cα distance bounds, etc. These intermediate states are used to restraint the protein backbone and assist its correct folding. In the recent years, several methods have been developed for predicting dihedral angles of a protein, but it is difficult to conclude which method is better than others. In this study, we benchmarked the performance of dihedral prediction methods ANGLOR and SPINE X on various datasets, including independent datasets. TANGLE dihedral prediction method was not benchmarked (due to unavailability of its standalone) and was compared with SPINE X and ANGLOR on only ANGLOR dataset on which TANGLE has reported its results. It was observed that SPINE X performed better than ANGLOR and TANGLE, especially in case of prediction of dihedral angles of glycine and proline residues. The analysis suggested that angle shifting was the foremost reason of better performance of SPINE X. We further evaluated the performance of the methods on independent ccPDB30 dataset and observed that SPINE X performed better than ANGLOR.
PMCID: PMC4148315  PMID: 25166857
2.  In Silico Approach for Predicting Toxicity of Peptides and Proteins 
PLoS ONE  2013;8(9):e73957.
Over the past few decades, scientific research has been focused on developing peptide/protein-based therapies to treat various diseases. With the several advantages over small molecules, including high specificity, high penetration, ease of manufacturing, peptides have emerged as promising therapeutic molecules against many diseases. However, one of the bottlenecks in peptide/protein-based therapy is their toxicity. Therefore, in the present study, we developed in silico models for predicting toxicity of peptides and proteins.
We obtained toxic peptides having 35 or fewer residues from various databases for developing prediction models. Non-toxic or random peptides were obtained from SwissProt and TrEMBL. It was observed that certain residues like Cys, His, Asn, and Pro are abundant as well as preferred at various positions in toxic peptides. We developed models based on machine learning technique and quantitative matrix using various properties of peptides for predicting toxicity of peptides. The performance of dipeptide-based model in terms of accuracy was 94.50% with MCC 0.88. In addition, various motifs were extracted from the toxic peptides and this information was combined with dipeptide-based model for developing a hybrid model. In order to evaluate the over-optimization of the best model based on dipeptide composition, we evaluated its performance on independent datasets and achieved accuracy around 90%. Based on above study, a web server, ToxinPred has been developed, which would be helpful in predicting (i) toxicity or non-toxicity of peptides, (ii) minimum mutations in peptides for increasing or decreasing their toxicity, and (iii) toxic regions in proteins.
ToxinPred is a unique in silico method of its kind, which will be useful in predicting toxicity of peptides/proteins. In addition, it will be useful in designing least toxic peptides and discovering toxic regions in proteins. We hope that the development of ToxinPred will provide momentum to peptide/protein-based drug discovery (
PMCID: PMC3772798  PMID: 24058508
3.  QSAR-Based Models for Designing Quinazoline/Imidazothiazoles/Pyrazolopyrimidines Based Inhibitors against Wild and Mutant EGFR 
PLoS ONE  2014;9(7):e101079.
Overexpression of EGFR is responsible for causing a number of cancers, including lung cancer as it activates various downstream signaling pathways. Thus, it is important to control EGFR function in order to treat the cancer patients. It is well established that inhibiting ATP binding within the EGFR kinase domain regulates its function. The existing quinazoline derivative based drugs used for treating lung cancer that inhibits the wild type of EGFR. In this study, we have made a systematic attempt to develop QSAR models for designing quinazoline derivatives that could inhibit wild EGFR and imidazothiazoles/pyrazolopyrimidines derivatives against mutant EGFR. In this study, three types of prediction methods have been developed to design inhibitors against EGFR (wild, mutant and both). First, we developed models for predicting inhibitors against wild type EGFR by training and testing on dataset containing 128 quinazoline based inhibitors. This dataset was divided into two subsets called wild_train and wild_valid containing 103 and 25 inhibitors respectively. The models were trained and tested on wild_train dataset while performance was evaluated on the wild_valid called validation dataset. We achieved a maximum correlation between predicted and experimentally determined inhibition (IC50) of 0.90 on validation dataset. Secondly, we developed models for predicting inhibitors against mutant EGFR (L858R) on mutant_train, and mutant_valid dataset and achieved a maximum correlation between 0.834 to 0.850 on these datasets. Finally, an integrated hybrid model has been developed on a dataset containing wild and mutant inhibitors and got maximum correlation between 0.761 to 0.850 on different datasets. In order to promote open source drug discovery, we developed a webserver for designing inhibitors against wild and mutant EGFR along with providing standalone ( and Galaxy ( version of software. We hope our webserver ( will play a vital role in designing new anticancer drugs.
PMCID: PMC4081576  PMID: 24992720
4.  ParaPep: a web resource for experimentally validated antiparasitic peptide sequences and their structures 
ParaPep is a repository of antiparasitic peptides, which provides comprehensive information related to experimentally validated antiparasitic peptide sequences and their structures. The data were collected and compiled from published research papers, patents and from various databases. The current release of ParaPep holds 863 entries among which 519 are unique peptides. In addition to peptides having natural amino acids, ParaPep also consists of peptides having d-amino acids and chemically modified residues. In ParaPep, most of the peptides have been evaluated for growth inhibition of various species of Plasmodium, Leishmania and Trypanosoma. We have provided comprehensive information about these peptides that include peptide sequence, chemical modifications, stereochemistry, antiparasitic activity, origin, nature of peptide, assay types, type of parasite, mode of action and hemolytic activity. Structures of peptides consisting of natural, as well as modified amino acids have been determined using state-of-the-art software, PEPstr. To facilitate users, various user-friendly web tools, for data fetching, analysis and browsing, have been integrated. We hope that ParaPep will be advantageous in designing therapeutic peptides against parasitic diseases.
Database URL:
PMCID: PMC4054663  PMID: 24923818
5.  Designing of promiscuous inhibitors against pancreatic cancer cell lines 
Scientific Reports  2014;4:4668.
Pancreatic cancer remains the most devastating disease with worst prognosis. There is a pressing need to accelerate the drug discovery process to identify new effective drug candidates against pancreatic cancer. We have developed QSAR models for predicting promiscuous inhibitors using the pharmacological data. Our models achieved maximum Pearson correlation coefficient of 0.86, when evaluated on 10-fold cross-validation. Our models have also successfully validated the drug-to-oncogene relationship and further we used these models to screen FDA approved drugs and tested them in vitro. We have integrated these models in a webserver named as DiPCell, which will be useful for screening and designing novel promiscuous drug molecules. We have also identified the most and least effective drugs for pancreatic cancer cell lines. On the other side, we have identified resistant pancreatic cancer cell lines, which need investigative scanner on them to put light on resistant mechanism in pancreatic cancer.
PMCID: PMC3985076  PMID: 24728108
6.  Herceptin Resistance Database for Understanding Mechanism of Resistance in Breast Cancer Patients 
Scientific Reports  2014;4:4483.
Monoclonal antibody Trastuzumab/Herceptin is considered as frontline therapy for Her2-positive breast cancer patients. However, it is not effective against several patients due to acquired or de novo resistance. In last one decade, several assays have been performed to understand the mechanism of Herceptin resistance with/without supplementary drugs. This manuscript describes a database HerceptinR, developed for understanding the mechanism of resistance at genetic level. HerceptinR maintains information about 2500 assays performed against various breast cancer cell lines (BCCs), for improving sensitivity of Herceptin with or without supplementary drugs. In order to understand Herceptin resistance at genetic level, we integrated genomic data of BCCs that include expression, mutations and copy number variations in different cell lines. HerceptinR will play a vital role in i) designing biomarkers to identify patients eligible for Herceptin treatment and ii) identification of appropriate supplementary drug for a particular patient. HerceptinR is available at
PMCID: PMC3967150  PMID: 24670875
7.  PCMdb: Pancreatic Cancer Methylation Database 
Scientific Reports  2014;4:4197.
Pancreatic cancer is the fifth most aggressive malignancy and urgently requires new biomarkers to facilitate early detection. For providing impetus to the biomarker discovery, we have developed Pancreatic Cancer Methylation Database (PCMDB,, a comprehensive resource dedicated to methylation of genes in pancreatic cancer. Data was collected and compiled manually from published literature. PCMdb has 65907 entries for methylation status of 4342 unique genes. In PCMdb, data was compiled for both cancer cell lines (53565 entries for 88 cell lines) and cancer tissues (12342 entries for 3078 tissue samples). Among these entries, 47.22% entries reported a high level of methylation for the corresponding genes while 10.87% entries reported low level of methylation. PCMdb covers five major subtypes of pancreatic cancer; however, most of the entries were compiled for adenocarcinomas (88.38%) and mucinous neoplasms (5.76%). A user-friendly interface has been developed for data browsing, searching and analysis. We anticipate that PCMdb will be helpful for pancreatic cancer biomarker discovery.
PMCID: PMC3935225  PMID: 24569397
8.  Correction: Hybrid Approach for Predicting Coreceptor Used by HIV-1 from Its V3 Loop Amino Acid Sequence 
PLoS ONE  2013;8(11):10.1371/annotation/5c57dcdc-e5d9-4999-a7d0-32004427cba5.
PMCID: PMC3821749  PMID: 24244254
9.  Hemolytik: a database of experimentally determined hemolytic and non-hemolytic peptides 
Nucleic Acids Research  2013;42(Database issue):D444-D449.
Hemolytik ( is a manually curated database of experimentally determined hemolytic and non-hemolytic peptides. Data were compiled from a large number of published research articles and various databases like Antimicrobial Peptide Database, Collection of Anti-microbial Peptides, Dragon Antimicrobial Peptide Database and Swiss-Prot. The current release of Hemolytik database contains ∼3000 entries that include ∼2000 unique peptides whose hemolytic activities were evaluated on erythrocytes isolated from as many as 17 different sources. Each entry in Hemolytik provides comprehensive information about a peptide, like its name, sequence, origin, reported function, property such as chirality, types (linear and cyclic), end modifications as well as details pertaining to its hemolytic activity. In addition, tertiary structure of each peptide has been predicted, and secondary structure states have been assigned. To facilitate the scientific community, a user-friendly interface has been developed with various tools for data searching and analysis. We hope, Hemolytik will be useful for researchers working in the field of designing therapeutic peptides.
PMCID: PMC3964980  PMID: 24174543
10.  In silico Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences 
PLoS ONE  2013;8(6):e67008.
Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large number of eukaryotic glycoproteins also have therapeutic and potential technology applications. Therefore, characterization and analysis of glycosites (glycosylated residues) in these proteins is of great interest to biologists. In order to cater these needs a number of in silico tools have been developed over the years, however, a need to get even better prediction tools remains. Therefore, in this study we have developed a new webserver GlycoEP for more accurate prediction of N-linked, O-linked and C-linked glycosites in eukaryotic glycoproteins using two larger datasets, namely, standard and advanced datasets. In case of standard datasets no two glycosylated proteins are more similar than 40%; advanced datasets are highly non-redundant where no two glycosites’ patterns (as defined in methods) have more than 60% similarity. Further, based on our results with several algorihtms developed using different machine-learning techniques, we found Support Vector Machine (SVM) as optimum tool to develop glycosite prediction models. Accordingly, using our more stringent and non-redundant advanced datasets, the SVM based models developed in this study achieved a prediction accuracy of 84.26%, 86.87% and 91.43% with corresponding MCC of 0.54, 0.20 and 0.78, for N-, O- and C-linked glycosites, respectively. The best performing models trained on advanced datasets were then implemented as a user-friendly web server GlycoEP ( Additionally, this server provides prediction models developed on standard datasets and allows users to scan sequons in input protein sequences.
PMCID: PMC3695939  PMID: 23840574
11.  Improved Method for Linear B-Cell Epitope Prediction Using Antigen’s Primary Sequence 
PLoS ONE  2013;8(5):e62216.
One of the major challenges in designing a peptide-based vaccine is the identification of antigenic regions in an antigen that can stimulate B-cell’s response, also called B-cell epitopes. In the past, several methods have been developed for the prediction of conformational and linear (or continuous) B-cell epitopes. However, the existing methods for predicting linear B-cell epitopes are far from perfection. In this study, an attempt has been made to develop an improved method for predicting linear B-cell epitopes. We have retrieved experimentally validated B-cell epitopes as well as non B-cell epitopes from Immune Epitope Database and derived two types of datasets called Lbtope_Variable and Lbtope_Fixed length datasets. The Lbtope_Variable dataset contains 14876 B-cell epitope and 23321 non-epitopes of variable length where as Lbtope_Fixed length dataset contains 12063 B-cell epitopes and 20589 non-epitopes of fixed length. We also evaluated the performance of models on above datasets after removing highly identical peptides from the datasets. In addition, we have derived third dataset Lbtope_Confirm having 1042 epitopes and 1795 non-epitopes where each epitope or non-epitope has been experimentally validated in at least two studies. A number of models have been developed to discriminate epitopes and non-epitopes using different machine-learning techniques like Support Vector Machine, and K-Nearest Neighbor. We achieved accuracy from ∼54% to 86% using diverse s features like binary profile, dipeptide composition, AAP (amino acid pair) profile. In this study, for the first time experimentally validated non B-cell epitopes have been used for developing method for predicting linear B-cell epitopes. In previous studies, random peptides have been used as non B-cell epitopes. In order to provide service to scientific community, a web server LBtope has been developed for predicting and designing B-cell epitopes (
PMCID: PMC3646881  PMID: 23667458
12.  Hybrid Approach for Predicting Coreceptor Used by HIV-1 from Its V3 Loop Amino Acid Sequence 
PLoS ONE  2013;8(4):e61437.
HIV-1 infects the host cell by interacting with the primary receptor CD4 and a coreceptor CCR5 or CXCR4. Maraviroc, a CCR5 antagonist binds to CCR5 receptor. Thus, it is important to identify the coreceptor used by the HIV strains dominating in the patient. In past, a number of experimental assays and in-silico techniques have been developed for predicting the coreceptor tropism. The prediction accuracy of these methods is excellent when predicting CCR5(R5) tropic sequences but is relatively poor for CXCR4(X4) tropic sequences. Therefore, any new method for accurate determination of coreceptor usage would be of paramount importance to the successful management of HIV-infected individuals.
The dataset used in this study comprised 1799 R5-tropic and 598 X4-tropic third variable (V3) sequences of HIV-1. We compared the amino acid composition of both types of V3 sequences and observed that certain types of residues, e.g., Asparagine and Isoleucine, were preferred in R5-tropic sequences whereas residues like Lysine, Arginine, and Tryptophan were preferred in X4-tropic sequences. Initially, Support Vector Machine-based models were developed using amino acid composition, dipeptide composition, and split amino acid composition, which achieved accuracy up to 90%. We used BLAST to discriminate R5- and X4-tropic sequences and correctly predicted 93.16% of R5- and 75.75% of X4-tropic sequences. In order to improve the prediction accuracy, a Hybrid model was developed that achieved 91.66% sensitivity, 81.77% specificity, 89.19% accuracy and 0.72 Matthews Correlation Coefficient. The performance of our models was also evaluated on an independent dataset (256 R5- and 81 X4-tropic sequences) and achieved maximum accuracy of 84.87% with Matthews Correlation Coefficient 0.63.
This study describes a highly efficient method for predicting HIV-1 coreceptor usage from V3 sequences. In order to provide a service to the scientific community, a webserver HIVcoPred was developed ( for predicting the coreceptor usage.
PMCID: PMC3626595  PMID: 23596523
13.  Computational approach for designing tumor homing peptides 
Scientific Reports  2013;3:1607.
Tumor homing peptides are small peptides that home specifically to tumor and tumor associated microenvironment i.e. tumor vasculature, after systemic delivery. Keeping in mind the huge therapeutic importance of these peptides, we have made an attempt to analyze and predict tumor homing peptides. It was observed that certain types of residues are preferred in tumor homing peptides. Therefore, we developed support vector machine based models for predicting tumor homing peptides using amino acid composition and binary profiles of peptides. Amino acid composition, dipeptide composition and binary profile-based models achieved a maximum accuracy of 86.56%, 82.03%, and 84.19% respectively. These methods have been implemented in a user-friendly web server, TumorHPD. We anticipate that this method will be helpful to design novel tumor homing peptides. TumorHPD web server is freely accessible at
PMCID: PMC3617442  PMID: 23558316
14.  In silico approaches for designing highly effective cell penetrating peptides 
Cell penetrating peptides have gained much recognition as a versatile transport vehicle for the intracellular delivery of wide range of cargoes (i.e. oligonucelotides, small molecules, proteins, etc.), that otherwise lack bioavailability, thus offering great potential as future therapeutics. Keeping in mind the therapeutic importance of these peptides, we have developed in silico methods for the prediction of cell penetrating peptides, which can be used for rapid screening of such peptides prior to their synthesis.
In the present study, support vector machine (SVM)-based models have been developed for predicting and designing highly effective cell penetrating peptides. Various features like amino acid composition, dipeptide composition, binary profile of patterns, and physicochemical properties have been used as input features. The main dataset used in this study consists of 708 peptides. In addition, we have identified various motifs in cell penetrating peptides, and used these motifs for developing a hybrid prediction model. Performance of our method was evaluated on an independent dataset and also compared with that of the existing methods.
In cell penetrating peptides, certain residues (e.g. Arg, Lys, Pro, Trp, Leu, and Ala) are preferred at specific locations. Thus, it was possible to discriminate cell-penetrating peptides from non-cell penetrating peptides based on amino acid composition. All models were evaluated using five-fold cross-validation technique. We have achieved a maximum accuracy of 97.40% using the hybrid model that combines motif information and binary profile of the peptides. On independent dataset, we achieved maximum accuracy of 81.31% with MCC of 0.63.
The present study demonstrates that features like amino acid composition, binary profile of patterns and motifs, can be used to train an SVM classifier that can predict cell penetrating peptides with higher accuracy. The hybrid model described in this study achieved more accuracy than the previous methods and thus may complement the existing methods. Based on the above study, a user- friendly web server CellPPD has been developed to help the biologists, where a user can predict and design CPPs with much ease. CellPPD web server is freely accessible at
PMCID: PMC3615965  PMID: 23517638
Cell penetrating peptides; Drug delivery; Amino acid composition; Support vector machine
15.  CancerDR: Cancer Drug Resistance Database 
Scientific Reports  2013;3:1445.
Cancer therapies are limited by the development of drug resistance, and mutations in drug targets is one of the main reasons for developing acquired resistance. The adequate knowledge of these mutations in drug targets would help to design effective personalized therapies. Keeping this in mind, we have developed a database “CancerDR”, which provides information of 148 anti-cancer drugs, and their pharmacological profiling across 952 cancer cell lines. CancerDR provides comprehensive information about each drug target that includes; (i) sequence of natural variants, (ii) mutations, (iii) tertiary structure, and (iv) alignment profile of mutants/variants. A number of web-based tools have been integrated in CancerDR. This database will be very useful for identification of genetic alterations in genes encoding drug targets, and in turn the residues responsible for drug resistance. CancerDR allows user to identify promiscuous drug molecules that can kill wide range of cancer cells. CancerDR is freely accessible at
PMCID: PMC3595698  PMID: 23486013
16.  Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information 
BMC Bioinformatics  2013;14:44.
The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure.
In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets.
This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (
PMCID: PMC3577447  PMID: 23387468
Vitamin-interacting residue; Pyridoxal-5-phosphate; SVM; PSSM; VitaPred
17.  NPACT: Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database 
Nucleic Acids Research  2012;41(Database issue):D1124-D1129.
Plant-derived molecules have been highly valued by biomedical researchers and pharmaceutical companies for developing drugs, as they are thought to be optimized during evolution. Therefore, we have collected and compiled a central resource Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database (NPACT, that gathers the information related to experimentally validated plant-derived natural compounds exhibiting anti-cancerous activity (in vitro and in vivo), to complement the other databases. It currently contains 1574 compound entries, and each record provides information on their structure, manually curated published data on in vitro and in vivo experiments along with reference for users referral, inhibitory values (IC50/ED50/EC50/GI50), properties (physical, elemental and topological), cancer types, cell lines, protein targets, commercial suppliers and drug likeness of compounds. NPACT can easily be browsed or queried using various options, and an online similarity tool has also been made available. Further, to facilitate retrieval of existing data, each record is hyperlinked to similar databases like SuperNatural, Herbal Ingredients’ Targets, Comparative Toxicogenomics Database, PubChem and NCI-60 GI50 data.
PMCID: PMC3531140  PMID: 23203877
18.  GlycoPP: A Webserver for Prediction of N- and O-Glycosites in Prokaryotic Protein Sequences 
PLoS ONE  2012;7(7):e40155.
Glycosylation is one of the most abundant post-translational modifications (PTMs) required for various structure/function modulations of proteins in a living cell. Although elucidated recently in prokaryotes, this type of PTM is present across all three domains of life. In prokaryotes, two types of protein glycan linkages are more widespread namely, N- linked, where a glycan moiety is attached to the amide group of Asn, and O- linked, where a glycan moiety is attached to the hydroxyl group of Ser/Thr/Tyr. For their biologically ubiquitous nature, significance, and technology applications, the study of prokaryotic glycoproteins is a fast emerging area of research. Here we describe new Support Vector Machine (SVM) based algorithms (models) developed for predicting glycosylated-residues (glycosites) with high accuracy in prokaryotic protein sequences. The models are based on binary profile of patterns, composition profile of patterns, and position-specific scoring matrix profile of patterns as training features. The study employ an extensive dataset of 107 N-linked and 116 O-linked glycosites extracted from 59 experimentally characterized glycoproteins of prokaryotes. This dataset includes validated N-glycosites from phyla Crenarchaeota, Euryarchaeota (domain Archaea), Proteobacteria (domain Bacteria) and validated O-glycosites from phyla Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria (domain Bacteria). In view of the current understanding that glycosylation occurs on folded proteins in bacteria, hybrid models have been developed using information on predicted secondary structures and accessible surface area in various combinations with training features. Using these models, N-glycosites and O-glycosites could be predicted with an accuracy of 82.71% (MCC 0.65) and 73.71% (MCC 0.48), respectively. An evaluation of the best performing models with 28 independent prokaryotic glycoproteins confirms the suitability of these models in predicting N- and O-glycosites in potential glycoproteins from aforementioned organisms, with reasonably high confidence. A web server GlycoPP, implementing these models is available freely at http:/
PMCID: PMC3392279  PMID: 22808107
19.  TumorHoPe: A Database of Tumor Homing Peptides 
PLoS ONE  2012;7(4):e35187.
Cancer is responsible for millions of immature deaths every year and is an economical burden on developing countries. One of the major challenges in the present era is to design drugs that can specifically target tumor cells not normal cells. In this context, tumor homing peptides have drawn much attention. These peptides are playing a vital role in delivering drugs in tumor tissues with high specificity. In order to provide service to scientific community, we have developed a database of tumor homing peptides called TumorHoPe.
TumorHoPe is a manually curated database of experimentally validated tumor homing peptides that specifically recognize tumor cells and tumor associated microenvironment, i.e., angiogenesis. These peptides were collected and compiled from published papers, patents and databases. Current release of TumorHoPe contains 744 peptides. Each entry provides comprehensive information of a peptide that includes its sequence, target tumor, target cell, techniques of identification, peptide receptor, etc. In addition, we have derived various types of information from these peptide sequences that include secondary/tertiary structure, amino acid composition, and physicochemical properties of peptides. Peptides in this database have been found to target different types of tumors that include breast, lung, prostate, melanoma, colon, etc. These peptides have some common motifs including RGD (Arg-Gly-Asp) and NGR (Asn-Gly-Arg) motifs, which specifically recognize tumor angiogenic markers. TumorHoPe has been integrated with many web-based tools like simple/complex search, database browsing and peptide mapping. These tools allow a user to search tumor homing peptides based on their amino acid composition, charge, polarity, hydrophobicity, etc.
TumorHoPe is a unique database of its kind, which provides comprehensive information about experimentally validated tumor homing peptides and their target cells. This database will be very useful in designing peptide-based drugs and drug-delivery system. It is freely available at
PMCID: PMC3327652  PMID: 22523575
20.  PolysacDB: A Database of Microbial Polysaccharide Antigens and Their Antibodies 
PLoS ONE  2012;7(4):e34613.
Vaccines based on microbial cell surface polysaccharides have long been considered as attractive means to control infectious diseases. To realize this goal, detailed systematic information about the antigenic polysaccharide is necessary. However, only a few databases that provide limited knowledge in this area are available. This paper describes PolysacDB, a manually curated database of antigenic polysaccharides. We collected and compiled comprehensive information from literature and web resources about antigenic polysaccharides of microbial origin. The current version of the database has 1,554 entries of 149 different antigenic polysaccharides from 347 different microbes. Each entry provides comprehensive information about an antigenic polysaccharide, i.e., its origin, function, protocols for its conjugation to carriers, antibodies produced, details of assay systems, specificities of antibodies, proposed epitopes involved and antibody utilities. For convenience to the user, we have integrated web interface for searching, advanced searching and browsing data in database. This database will be useful for researchers working on polysaccharide-based vaccines. It is freely available from the URL:
PMCID: PMC3324500  PMID: 22509333
21.  ccPDB: compilation and creation of data sets from Protein Data Bank 
Nucleic Acids Research  2011;40(Database issue):D486-D489.
ccPDB ( is a database of data sets compiled from the literature and Protein Data Bank (PDB). First, we collected and compiled data sets from the literature used for developing bioinformatics methods to annotate the structure and function of proteins. Second, data sets were derived from the latest release of PDB using standard protocols. Third, we developed a powerful module for creating a wide range of customized data sets from the current release of PDB. This is a flexible module that allows users to create data sets using a simple six step procedure. In addition, a number of web services have been integrated in ccPDB, which include submission of jobs on PDB-based servers, annotation of protein structures and generation of patterns. This database maintains >30 types of data sets such as secondary structure, tight-turns, nucleotide interacting residues, metals interacting residues, DNA/RNA binding residues and so on.
PMCID: PMC3245168  PMID: 22139939
22.  ProGlycProt: a repository of experimentally characterized prokaryotic glycoproteins 
Nucleic Acids Research  2011;40(Database issue):D388-D393.
ProGlycProt ( is an open access, manually curated, comprehensive repository of bacterial and archaeal glycoproteins with at least one experimentally validated glycosite (glycosylated residue). To facilitate maximum information at one point, the database is arranged under two sections: (i) ProCGP—the main data section consisting of 95 entries with experimentally characterized glycosites and (ii) ProUGP—a supplementary data section containing 245 entries with experimentally identified glycosylation but uncharacterized glycosites. Every entry in the database is fully cross-referenced and enriched with available published information about source organism, coding gene, protein, glycosites, glycosylation type, attached glycan, associated oligosaccharyl/glycosyl transferases (OSTs/GTs), supporting references, and applicable additional information. Interestingly, ProGlycProt contains as many as 174 entries for which information is unavailable or the characterized glycosites are unannotated in Swiss-Prot release 2011_07. The website supports a dedicated structure gallery of homology models and crystal structures of characterized glycoproteins in addition to two new tools developed in view of emerging information about prokaryotic sequons (conserved sequences of amino acids around glycosites) that are never or rarely seen in eukaryotic glycoproteins. ProGlycProt provides an extensive compilation of experimentally identified glycosites (334) and glycoproteins (340) of prokaryotes that could serve as an information resource for research and technology applications in glycobiology.
PMCID: PMC3245024  PMID: 22039152
23.  HIVsirDB: A Database of HIV Inhibiting siRNAs 
PLoS ONE  2011;6(10):e25917.
Human immunodeficiency virus (HIV) is responsible for millions of deaths every year. The current treatment involves the use of multiple antiretroviral agents that may harm patients due to their toxic nature. RNA interference (RNAi) is a potent candidate for the future treatment of HIV, uses short interfering RNA (siRNA/shRNA) for silencing HIV genes. In this study, attempts have been made to create a database HIVsirDB of siRNAs responsible for silencing HIV genes.
HIVsirDB is a manually curated database of HIV inhibiting siRNAs that provides comprehensive information about each siRNA or shRNA. Information was collected and compiled from literature and public resources. This database contains around 750 siRNAs that includes 75 partially complementary siRNAs differing by one or more bases with the target sites and over 100 escape mutant sequences. HIVsirDB structure contains sixteen fields including siRNA sequence, HIV strain, targeted genome region, efficacy and conservation of target sequences. In order to facilitate user, many tools have been integrated in this database that includes; i) siRNAmap for mapping siRNAs on target sequence, ii) HIVsirblast for BLAST search against database, iii) siRNAalign for aligning siRNAs.
HIVsirDB is a freely accessible database of siRNAs which can silence or degrade HIV genes. It covers 26 types of HIV strains and 28 cell types. This database will be very useful for developing models for predicting efficacy of HIV inhibiting siRNAs. In summary this is a useful resource for researchers working in the field of siRNA based HIV therapy. HIVsirDB database is accessible at
PMCID: PMC3191155  PMID: 22022467
24.  Identification of Mannose Interacting Residues Using Local Composition 
PLoS ONE  2011;6(9):e24039.
Mannose binding proteins (MBPs) play a vital role in several biological functions such as defense mechanisms. These proteins bind to mannose on the surface of a wide range of pathogens and help in eliminating these pathogens from our body. Thus, it is important to identify mannose interacting residues (MIRs) in order to understand mechanism of recognition of pathogens by MBPs.
This paper describes modules developed for predicting MIRs in a protein. Support vector machine (SVM) based models have been developed on 120 mannose binding protein chains, where no two chains have more than 25% sequence similarity. SVM models were developed on two types of datasets: 1) main dataset consists of 1029 mannose interacting and 1029 non-interacting residues, 2) realistic dataset consists of 1029 mannose interacting and 10320 non-interacting residues. In this study, firstly, we developed standard modules using binary and PSSM profile of patterns and got maximum MCC around 0.32. Secondly, we developed SVM modules using composition profile of patterns and achieved maximum MCC around 0.74 with accuracy 86.64% on main dataset. Thirdly, we developed a model on a realistic dataset and achieved maximum MCC of 0.62 with accuracy 93.08%. Based on this study, a standalone program and web server have been developed for predicting mannose interacting residues in proteins (
Compositional analysis of mannose interacting and non-interacting residues shows that certain types of residues are preferred in mannose interaction. It was also observed that residues around mannose interacting residues have a preference for certain types of residues. Composition of patterns/peptide/segment has been used for predicting MIRs and achieved reasonable high accuracy. It is possible that this novel strategy may be effective to predict other types of interacting residues. This study will be useful in annotating the function of protein as well as in understanding the role of mannose in the immune system.
PMCID: PMC3172211  PMID: 21931639
25.  Designing of Highly Effective Complementary and Mismatch siRNAs for Silencing a Gene 
PLoS ONE  2011;6(8):e23443.
In past, numerous methods have been developed for predicting efficacy of short interfering RNA (siRNA). However these methods have been developed for predicting efficacy of fully complementary siRNA against a gene. Best of author's knowledge no method has been developed for predicting efficacy of mismatch siRNA against a gene. In this study, a systematic attempt has been made to identify highly effective complementary as well as mismatch siRNAs for silencing a gene.
Support vector machine (SVM) based models have been developed for predicting efficacy of siRNAs using composition, binary and hybrid pattern siRNAs. We achieved maximum correlation 0.67 between predicted and actual efficacy of siRNAs using hybrid model. All models were trained and tested on a dataset of 2182 siRNAs and performance was evaluated using five-fold cross validation techniques. The performance of our method desiRm is comparable to other well-known methods. In this study, first time attempt has been made to design mutant siRNAs (mismatch siRNAs). In this approach we mutated a given siRNA on all possible sites/positions with all possible nucleotides. Efficacy of each mutated siRNA is predicted using our method desiRm. It is well known from literature that mismatches between siRNA and target affects the silencing efficacy. Thus we have incorporated the rules derived from base mismatches experimental data to find out over all efficacy of mutated or mismatch siRNAs. Finally we developed a webserver, desiRm ( for designing highly effective siRNA for silencing a gene. This tool will be helpful to design siRNA to degrade disease isoform of heterozygous single nucleotide polymorphism gene without depleting the wild type protein.
PMCID: PMC3154470  PMID: 21853133

Results 1-25 (28)